summaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
...
* powerpc/kexec: Disable hard IRQ before kexecPhileas Fogg2013-02-281-0/+5
| | | | | | | | | | | | | commit 8520e443aa56cc157b015205ea53e7b9fc831291 upstream. Disable hard IRQ before kexec a new kernel image. Not doing it can result in corrupted data in the memory segments reserved for the new kernel. Signed-off-by: Phileas Fogg <phileas-fogg@mail.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: PXA3xx: program the CSMSADRCFG registerIgor Grinberg2013-02-282-1/+15
| | | | | | | | | | | | | | | | | | | commit d107a204154ddd79339203c2deeb7433f0cf6777 upstream. The Chip Select Configuration Register must be programmed to 0x2 in order to achieve the correct behavior of the Static Memory Controller. Without this patch devices wired to DFI and accessed through SMC cannot be accessed after resume from S2. Do not rely on the boot loader to program the CSMSADRCFG register by programming it in the kernel smemc module. Signed-off-by: Igor Grinberg <grinberg@compulab.co.il> Acked-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/kvm: Fix store status for ACRS/FPRSChristian Borntraeger2013-02-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | commit 15bc8d8457875f495c59d933b05770ba88d1eacb upstream. On store status we need to copy the current state of registers into a save area. Currently we might save stale versions: The sie state descriptor doesnt have fields for guest ACRS,FPRS, those registers are simply stored in the host registers. The host program must copy these away if needed. We do that in vcpu_put/load. If we now do a store status in KVM code between vcpu_put/load, the saved values are not up-to-date. Lets collect the ACRS/FPRS before saving them. This also fixes some strange problems with hotplug and virtio-ccw, since the low level machine check handler (on hotplug a machine check will happen) will revalidate all registers with the content of the save area. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xen: Send spinlock IPI to all waitersStefan Bader2013-02-281-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 76eaca031f0af2bb303e405986f637811956a422 upstream. There is a loophole between Xen's current implementation of pv-spinlocks and the scheduler. This was triggerable through a testcase until v3.6 changed the TLB flushing code. The problem potentially is still there just not observable in the same way. What could happen was (is): 1. CPU n tries to schedule task x away and goes into a slow wait for the runq lock of CPU n-# (must be one with a lower number). 2. CPU n-#, while processing softirqs, tries to balance domains and goes into a slow wait for its own runq lock (for updating some records). Since this is a spin_lock_irqsave in softirq context, interrupts will be re-enabled for the duration of the poll_irq hypercall used by Xen. 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives an interrupt (e.g. endio) and when processing the interrupt, tries to wake up task x. But that is in schedule and still on_cpu, so try_to_wake_up goes into a tight loop. 4. The runq lock of CPU n-# gets unlocked, but the message only gets sent to the first waiter, which is CPU n-# and that is busily stuck. 5. CPU n-# never returns from the nested interruption to take and release the lock because the scheduler uses a busy wait. And CPU n never finishes the task migration because the unlock notification only went to CPU n-#. To avoid this and since the unlocking code has no real sense of which waiter is best suited to grab the lock, just send the IPI to all of them. This causes the waiters to return from the hyper- call (those not interrupted at least) and do active spinlocking. BugLink: http://bugs.launchpad.net/bugs/1011792 Acked-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86-32, mm: Remove reference to resume_map_numa_kva()H. Peter Anvin2013-02-282-8/+0
| | | | | | | | | | | | commit bb112aec5ee41427e9b9726e3d57b896709598ed upstream. Remove reference to removed function resume_map_numa_kva(). Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.Jan Beulich2013-02-171-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 13d2b4d11d69a92574a55bfd985cfb0ca77aebdc upstream. This fixes CVE-2013-0228 / XSA-42 Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user in 32bit PV guest can use to crash the > guest with the panic like this: ------------- general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/vbd-51712/block/xvda/dev Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1 EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0 EIP is at xen_iret+0x12/0x2b EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010 ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0 DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069 Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000) Stack: 00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000 Call Trace: Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00 8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40 10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02 EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0 general protection fault: 0000 [#2] ---[ end trace ab0d29a492dcd330 ]--- Kernel panic - not syncing: Fatal exception Pid: 1250, comm: r Tainted: G D --------------- 2.6.32-356.el6.i686 #1 Call Trace: [<c08476df>] ? panic+0x6e/0x122 [<c084b63c>] ? oops_end+0xbc/0xd0 [<c084b260>] ? do_general_protection+0x0/0x210 [<c084a9b7>] ? error_code+0x73/ ------------- Petr says: " I've analysed the bug and I think that xen_iret() cannot cope with mangled DS, in this case zeroed out (null selector/descriptor) by either xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT entry was invalidated by the reproducer. " Jan took a look at the preliminary patch and came up a fix that solves this problem: "This code gets called after all registers other than those handled by IRET got already restored, hence a null selector in %ds or a non-null one that got loaded from a code or read-only data descriptor would cause a kernel mode fault (with the potential of crashing the kernel as a whole, if panic_on_oops is set)." The way to fix this is to realize that the we can only relay on the registers that IRET restores. The two that are guaranteed are the %cs and %ss as they are always fixed GDT selectors. Also they are inaccessible from user mode - so they cannot be altered. This is the approach taken in this patch. Another alternative option suggested by Jan would be to relay on the subtle realization that using the %ebp or %esp relative references uses the %ss segment. In which case we could switch from using %eax to %ebp and would not need the %ss over-rides. That would also require one extra instruction to compensate for the one place where the register is used as scaled index. However Andrew pointed out that is too subtle and if further work was to be done in this code-path it could escape folks attention and lead to accidents. Reviewed-by: Petr Matousek <pmatouse@redhat.com> Reported-by: Petr Matousek <pmatouse@redhat.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/mm: Check if PUD is large when validating a kernel addressMel Gorman2013-02-172-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0ee364eb316348ddf3e0dfcd986f5f13f528f821 upstream. A user reported the following oops when a backup process reads /proc/kcore: BUG: unable to handle kernel paging request at ffffbb00ff33b000 IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110 [...] Call Trace: [<ffffffff811b8aaa>] read_kcore+0x17a/0x370 [<ffffffff811ad847>] proc_reg_read+0x77/0xc0 [<ffffffff81151687>] vfs_read+0xc7/0x130 [<ffffffff811517f3>] sys_read+0x53/0xa0 [<ffffffff81449692>] system_call_fastpath+0x16/0x1b Investigation determined that the bug triggered when reading system RAM at the 4G mark. On this system, that was the first address using 1G pages for the virt->phys direct mapping so the PUD is pointing to a physical address, not a PMD page. The problem is that the page table walker in kern_addr_valid() is not checking pud_large() and treats the physical address as if it was a PMD. If it happens to look like pmd_none then it'll silently fail, probably returning zeros instead of real data. If the data happens to look like a present PMD though, it will be walked resulting in the oops above. This patch adds the necessary pud_large() check. Unfortunately the problem was not readily reproducible and now they are running the backup program without accessing /proc/kcore so the patch has not been validated but I think it makes sense. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.coM> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86-64: Replace left over sti/cli in ia32 audit exit codeJan Beulich2013-02-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | commit 40a1ef95da85843696fc3ebe5fce39b0db32669f upstream. For some reason they didn't get replaced so far by their paravirt equivalents, resulting in code to be run with interrupts disabled that doesn't expect so (causing, in the observed case, a BUG_ON() to trigger) when syscall auditing is enabled. David (Cc-ed) came up with an identical fix, so likely this can be taken to count as an ack from him. Reported-by: Peter Moody <pmoody@google.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/5108E01902000078000BA9C5@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Tested-by: Peter Moody <pmoody@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCIH. Peter Anvin2013-02-031-0/+2
| | | | | | | | | | | | commit e43b3cec711a61edf047adf6204d542f3a659ef8 upstream. early_pci_allowed() and read_pci_config_16() are only available if CONFIG_PCI is defined. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Abdallah Chatila <abdallah.chatila@ericsson.com>
* efi, x86: Pass a proper identity mapping in efi_call_phys_prelogNathan Zimmer2013-02-031-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b8f2c21db390273c3eaf0e5308faeaeb1e233840 upstream. Update efi_call_phys_prelog to install an identity mapping of all available memory. This corrects a bug on very large systems with more then 512 GB in which bios would not be able to access addresses above not in the mapping. The result is a crash that looks much like this. BUG: unable to handle kernel paging request at 000000effd870020 IP: [<0000000078bce331>] 0x78bce330 PGD 0 Oops: 0000 [#1] SMP Modules linked in: CPU 0 Pid: 0, comm: swapper/0 Tainted: G W 3.8.0-rc1-next-20121224-medusa_ntz+ #2 Intel Corp. Stoutland Platform RIP: 0010:[<0000000078bce331>] [<0000000078bce331>] 0x78bce330 RSP: 0000:ffffffff81601d28 EFLAGS: 00010006 RAX: 0000000078b80e18 RBX: 0000000000000004 RCX: 0000000000000004 RDX: 0000000078bcf958 RSI: 0000000000002400 RDI: 8000000000000000 RBP: 0000000078bcf760 R08: 000000effd870000 R09: 0000000000000000 R10: 0000000000000000 R11: 00000000000000c3 R12: 0000000000000030 R13: 000000effd870000 R14: 0000000000000000 R15: ffff88effd870000 FS: 0000000000000000(0000) GS:ffff88effe400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000effd870020 CR3: 000000000160c000 CR4: 00000000000006b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81614400) Stack: 0000000078b80d18 0000000000000004 0000000078bced7b ffff880078b81fff 0000000000000000 0000000000000082 0000000078bce3a8 0000000000002400 0000000060000202 0000000078b80da0 0000000078bce45d ffffffff8107cb5a Call Trace: [<ffffffff8107cb5a>] ? on_each_cpu+0x77/0x83 [<ffffffff8102f4eb>] ? change_page_attr_set_clr+0x32f/0x3ed [<ffffffff81035946>] ? efi_call4+0x46/0x80 [<ffffffff816c5abb>] ? efi_enter_virtual_mode+0x1f5/0x305 [<ffffffff816aeb24>] ? start_kernel+0x34a/0x3d2 [<ffffffff816ae5ed>] ? repair_env_string+0x60/0x60 [<ffffffff816ae2be>] ? x86_64_start_reservations+0xba/0xc1 [<ffffffff816ae120>] ? early_idt_handlers+0x120/0x120 [<ffffffff816ae419>] ? x86_64_start_kernel+0x154/0x163 Code: Bad RIP value. RIP [<0000000078bce331>] 0x78bce330 RSP <ffffffff81601d28> CR2: 000000effd870020 ---[ end trace ead828934fef5eab ]--- Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Robin Holt <holt@sgi.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/msr: Add capabilities checkAlan Cox2013-02-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c903f0456bc69176912dee6dd25c6a66ee1aed00 upstream. At the moment the MSR driver only relies upon file system checks. This means that anything as root with any capability set can write to MSRs. Historically that wasn't very interesting but on modern processors the MSRs are such that writing to them provides several ways to execute arbitary code in kernel space. Sample code and documentation on doing this is circulating and MSR attacks are used on Windows 64bit rootkits already. In the Linux case you still need to be able to open the device file so the impact is fairly limited and reduces the security of some capability and security model based systems down towards that of a generic "root owns the box" setup. Therefore they should require CAP_SYS_RAWIO to prevent an elevation of capabilities. The impact of this is fairly minimal on most setups because they don't have heavy use of capabilities. Those using SELinux, SMACK or AppArmor rules might want to consider if their rulesets on the MSR driver could be tighter. Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: DMA: Fix struct page iterator in dma_cache_maint() to work with sparsememRussell King2013-02-031-8/+10
| | | | | | | | | | | | | | | | | | | | | | commit 15653371c67c3fbe359ae37b720639dd4c7b42c5 upstream. Subhash Jadavani reported this partial backtrace: Now consider this call stack from MMC block driver (this is on the ARMv7 based board): [<c001b50c>] (v7_dma_inv_range+0x30/0x48) from [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) from [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) from [<c0017ff8>] (dma_map_sg+0x3c/0x114) This is caused by incrementing the struct page pointer, and running off the end of the sparsemem page array. Fix this by incrementing by pfn instead, and convert the pfn to a struct page. Suggested-by: James Bottomley <JBottomley@Parallels.com> Tested-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86: Use enum instead of literals for trap values [PARTIAL]Kees Cook2013-01-271-0/+26
| | | | | | | | | | | | | | | | | [Based on commit c94082656dac74257f63e91f78d5d458ac781fa5 upstream, only taking the traps.h portion.] The traps are referred to by their numbers and it can be difficult to understand them while reading the code without context. This patch adds enumeration of the trap numbers and replaces the numbers with the correct enum for x86. Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20120310000710.GA32667@www.outflux.net Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Robin Holt <holt@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.Frediano Ziglio2013-01-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9174adbee4a9a49d0139f5d71969852b36720809 upstream. This fixes CVE-2013-0190 / XSA-40 There has been an error on the xen_failsafe_callback path for failed iret, which causes the stack pointer to be wrong when entering the iret_exc error path. This can result in the kernel crashing. In the classic kernel case, the relevant code looked a little like: popl %eax # Error code from hypervisor jz 5f addl $16,%esp jmp iret_exc # Hypervisor said iret fault 5: addl $16,%esp # Hypervisor said segment selector fault Here, there are two identical addls on either option of a branch which appears to have been optimised by hoisting it above the jz, and converting it to an lea, which leaves the flags register unaffected. In the PVOPS case, the code looks like: popl_cfi %eax # Error from the hypervisor lea 16(%esp),%esp # Add $16 before choosing fault path CFI_ADJUST_CFA_OFFSET -16 jz 5f addl $16,%esp # Incorrectly adjust %esp again jmp iret_exc It is possible unprivileged userspace applications to cause this behaviour, for example by loading an LDT code selector, then changing the code selector to be not-present. At this point, there is a race condition where it is possible for the hypervisor to return back to userspace from an interrupt, fault on its own iret, and inject a failsafe_callback into the kernel. This bug has been present since the introduction of Xen PVOPS support in commit 5ead97c84 (xen: Core Xen implementation), in 2.6.23. Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: fix wii_memory_fixups() compile error on 3.0.y treeShuah Khan2013-01-211-2/+4
| | | | | | | | | | | | | | | | | | | | | [not upstream as the code involved was removed in the 3.3.0 release] Fix wii_memory_fixups() the following compile error on 3.0.y tree with wii_defconfig on 3.0.y tree. CC arch/powerpc/platforms/embedded6xx/wii.o arch/powerpc/platforms/embedded6xx/wii.c: In function ‘wii_memory_fixups’: arch/powerpc/platforms/embedded6xx/wii.c:88:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘phys_addr_t’ [-Werror=format] arch/powerpc/platforms/embedded6xx/wii.c:88:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Werror=format] arch/powerpc/platforms/embedded6xx/wii.c:90:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘phys_addr_t’ [-Werror=format] arch/powerpc/platforms/embedded6xx/wii.c:90:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Werror=format] cc1: all warnings being treated as errors make[2]: *** [arch/powerpc/platforms/embedded6xx/wii.o] Error 1 make[1]: *** [arch/powerpc/platforms/embedded6xx] Error 2 make: *** [arch/powerpc/platforms] Error 2 Signed-off-by: Shuah Khan <shuah.khan@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/Sandy Bridge: reserve pages when integrated graphics is presentJesse Barnes2013-01-211-0/+78
| | | | | | | | | | | | | | | | | | | | | | | commit a9acc5365dbda29f7be2884efb63771dc24bd815 upstream. SNB graphics devices have a bug that prevent them from accessing certain memory ranges, namely anything below 1M and in the pages listed in the table. So reserve those at boot if set detect a SNB gfx device on the CPU to avoid GPU hangs. Stephane Marchesin had a similar patch to the page allocator awhile back, but rather than reserving pages up front, it leaked them at allocation time. [ hpa: made a number of stylistic changes, marked arrays as static const, and made less verbose; use "memblock=debug" for full verbosity. ] Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/time: fix sched_clock() overflowHeiko Carstens2013-01-213-2/+30
| | | | | | | | | | | | | | | | | | | | | | commit ed4f20943cd4c7b55105c04daedf8d63ab6d499c upstream. Converting a 64 Bit TOD format value to nanoseconds means that the value must be divided by 4.096. In order to achieve that we multiply with 125 and divide by 512. When used within sched_clock() this triggers an overflow after appr. 417 days. Resulting in a sched_clock() return value that is much smaller than previously and therefore may cause all sort of weird things in subsystems that rely on a monotonic sched_clock() behaviour. To fix this implement a tod_to_ns() helper function which converts TOD values without overflow and call this function from both places that open coded the conversion: sched_clock() and kvm_s390_handle_wait(). Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sh: Fix FDPIC binary loaderThomas Schwinge2013-01-211-2/+2
| | | | | | | | | | | | | commit 4a71997a3279a339e7336ea5d0cd27282e2dea44 upstream. Ensure that the aux table is properly initialized, even when optional features are missing. Without this, the FDPIC loader did not work. This was meant to be included in commit d5ab780305bb6d60a7b5a74f18cf84eb6ad153b1. Signed-off-by: Thomas Schwinge <thomas@codesourcery.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: 44x: fix DCR read/writeAlexander Graf2013-01-171-0/+2
| | | | | | | | | | | | | commit e43a028752fed049e4bd94ef895542f96d79fa74 upstream. When remembering the direction of a DCR transaction, we should write to the same variable that we interpret on later when doing vcpu_run again. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86, amd: Disable way access filter on Piledriver CPUsAndre Przywara2013-01-171-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2bbf0a1427c377350f001fbc6260995334739ad7 upstream. The Way Access Filter in recent AMD CPUs may hurt the performance of some workloads, caused by aliasing issues in the L1 cache. This patch disables it on the affected CPUs. The issue is similar to that one of last year: http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html This new patch does not replace the old one, we just need another quirk for newer CPUs. The performance penalty without the patch depends on the circumstances, but is a bit less than the last year's 3%. The workloads affected would be those that access code from the same physical page under different virtual addresses, so different processes using the same libraries with ASLR or multiple instances of PIE-binaries. The code needs to be accessed simultaneously from both cores of the same compute unit. More details can be found here: http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf CPUs affected are anything with the core known as Piledriver. That includes the new parts of the AMD A-Series (aka Trinity) and the just released new CPUs of the FX-Series (aka Vishera). The model numbering is a bit odd here: FX CPUs have model 2, A-Series has model 10h, with possible extensions to 1Fh. Hence the range of model ids. Signed-off-by: Andre Przywara <osp@andrep.de> Link: http://lkml.kernel.org/r/1351700450-9277-1-git-send-email-osp@andrep.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/vdso: Remove redundant locking in update_vsyscall_tz()Shan Hai2013-01-171-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ce73ec6db47af84d1466402781ae0872a9e7873c upstream. The locking in update_vsyscall_tz() is not only unnecessary because the vdso code copies the data unproteced in __kernel_gettimeofday() but also introduces a hard to reproduce race condition between update_vsyscall() and update_vsyscall_tz(), which causes user space process to loop forever in vdso code. The following patch removes the locking from update_vsyscall_tz(). Locking is not only unnecessary because the vdso code copies the data unprotected in __kernel_gettimeofday() but also erroneous because updating the tb_update_count is not atomic and introduces a hard to reproduce race condition between update_vsyscall() and update_vsyscall_tz(), which further causes user space process to loop forever in vdso code. The below scenario describes the race condition, x==0 Boot CPU other CPU proc_P: x==0 timer interrupt update_vsyscall x==1 x++;sync settimeofday update_vsyscall_tz x==2 x++;sync x==3 sync;x++ sync;x++ proc_P: x==3 (loops until x becomes even) Because the ++ operator would be implemented as three instructions and not atomic on powerpc. A similar change was made for x86 in commit 6c260d58634 ("x86: vdso: Remove bogus locking in update_vsyscall_tz") Signed-off-by: Shan Hai <shan.hai@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: Fix CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n buildAnton Blanchard2013-01-171-1/+1
| | | | | | | | | | | | | | commit 11ee7e99f35ecb15f59b21da6a82d96d2cd3fcc8 upstream. If we build a kernel with CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n, the kernel fails when we run at a non zero offset. It turns out we were incorrectly wrapping some of the relocatable kernel code with CONFIG_CRASH_DUMP. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* CRIS: fix I/O macrosCorey Minyard2013-01-111-6/+33
| | | | | | | | | | | | | | | | | | | | | | | commit c24bf9b4cc6a0f330ea355d73bfdf1dae7e63a05 upstream. The inb/outb macros for CRIS are broken from a number of points of view, missing () around parameters and they have an unprotected if statement in them. This was breaking the compile of IPMI on CRIS and thus I was being annoyed by build regressions, so I fixed them. Plus I don't think they would have worked at all, since the data values were missing "&" and the outsl had a "3" instead of a "4" for the size. From what I can tell, this stuff is not used at all, so this can't be any more broken than it was before, anyway. Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Mikael Starvik <starvik@axis.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: missing ->mmap_sem around find_vma() in swp_emulate.cAl Viro2013-01-111-0/+2
| | | | | | | | | | | | | | | commit 7bf9b7bef881aac820bf1f2e9951a17b09bd7e04 upstream. find_vma() is *not* safe when somebody else is removing vmas. Not just the return value might get bogus just as you are getting it (this instance doesn't try to dereference the resulting vma), the search itself can get buggered in rather spectacular ways. IOW, ->mmap_sem really, really is not optional here. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: mm: use pteval_t to represent page protection valuesWill Deacon2013-01-111-1/+1
| | | | | | | | | | | | | | | | | | | commit 864aa04cd02979c2c755cb28b5f4fe56039171c0 upstream. When updating the page protection map after calculating the user_pgprot value, the base protection map is temporarily stored in an unsigned long type, causing truncation of the protection bits when LPAE is enabled. This effectively means that calls to mprotect() will corrupt the upper page attributes, clearing the XN bit unconditionally. This patch uses pteval_t to store the intermediate protection values, preserving the upper bits for 64-bit descriptors. Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sparc: huge_ptep_set_* functions need to call set_huge_pte_at()Dave Kleikamp2013-01-111-2/+8
| | | | | | | | | | | | | | | [ Upstream commit 6cb9c3697585c47977c42c5cc1b9fc49247ac530 ] Modifying the huge pte's requires that all the underlying pte's be modified. Version 2: added missing flush_tlb_page() Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86, amd: Disable way access filter on Piledriver CPUsAndre Przywara2013-01-111-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2bbf0a1427c377350f001fbc6260995334739ad7 upstream. The Way Access Filter in recent AMD CPUs may hurt the performance of some workloads, caused by aliasing issues in the L1 cache. This patch disables it on the affected CPUs. The issue is similar to that one of last year: http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html This new patch does not replace the old one, we just need another quirk for newer CPUs. The performance penalty without the patch depends on the circumstances, but is a bit less than the last year's 3%. The workloads affected would be those that access code from the same physical page under different virtual addresses, so different processes using the same libraries with ASLR or multiple instances of PIE-binaries. The code needs to be accessed simultaneously from both cores of the same compute unit. More details can be found here: http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf CPUs affected are anything with the core known as Piledriver. That includes the new parts of the AMD A-Series (aka Trinity) and the just released new CPUs of the FX-Series (aka Vishera). The model numbering is a bit odd here: FX CPUs have model 2, A-Series has model 10h, with possible extensions to 1Fh. Hence the range of model ids. Signed-off-by: Andre Przywara <osp@andrep.de> Link: http://lkml.kernel.org/r/1351700450-9277-1-git-send-email-osp@andrep.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: Keep thread.dscr and thread.dscr_inherit in syncAnton Blanchard2012-12-172-2/+5
| | | | | | | | | | | | | | | | | | | | | | | commit 00ca0de02f80924dfff6b4f630e1dff3db005e35 upstream. When we update the DSCR either via emulation of mtspr(DSCR) or via a change to dscr_default in sysfs we don't update thread.dscr. We will eventually update it at context switch time but there is a period where thread.dscr is incorrect. If we fork at this point we will copy the old value of thread.dscr into the child. To avoid this, always keep thread.dscr in sync with reality. This issue was found with the following testcase: http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: Update DSCR on all CPUs when writing sysfs dscr_defaultAnton Blanchard2012-12-171-0/+8
| | | | | | | | | | | | | | | | | | | commit 1b6ca2a6fe56e7697d57348646e07df08f43b1bb upstream. Writing to dscr_default in sysfs doesn't actually change the DSCR - we rely on a context switch on each CPU to do the work. There is no guarantee we will get a context switch in a reasonable amount of time so fire off an IPI to force an immediate change. This issue was found with the following test case: http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86: hpet: Fix masking of MSI interruptsJan Beulich2012-12-171-2/+2
| | | | | | | | | | | | | | commit 6acf5a8c931da9d26c8dd77d784daaf07fa2bff0 upstream. HPET_TN_FSB is not a proper mask bit; it merely toggles between MSI and legacy interrupt delivery. The proper mask bit is HPET_TN_ENABLE, so use both bits when (un)masking the interrupt. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/5093E09002000078000A60E6@nat28.tlf.novell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/ptrace: Fix build with gcc 4.6Benjamin Herrenschmidt2012-12-171-4/+14
| | | | | | | | | | | | | | | | | | | | | | | commit e69b742a6793dc5bf16f6eedca534d4bc10d68b2 upstream. gcc (rightfully) complains that we are accessing beyond the end of the fpr array (we do, to access the fpscr). The only sane thing to do (whether anything in that code can be called remotely sane is debatable) is to special case fpscr and handle it as a separate statement. I initially tried to do it it by making the array access conditional to index < PT_FPSCR and using a 3rd else leg but for some reason gcc was unable to understand it and still spewed the warning. So I ended up with something a tad more intricated but it seems to build on 32-bit and on 64-bit with and without VSX. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: 7566/1: vfp: fix save and restore when running on pre-VFPv3 and ↵Paul Walmsley2012-12-173-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_VFPv3 set commit 39141ddfb63a664f26d3f42f64ee386e879b492c upstream. After commit 846a136881b8f73c1f74250bf6acfaa309cab1f2 ("ARM: vfp: fix saving d16-d31 vfp registers on v6+ kernels"), the OMAP 2430SDP board started crashing during boot with omap2plus_defconfig: [ 3.875122] mmcblk0: mmc0:e624 SD04G 3.69 GiB [ 3.915954] mmcblk0: p1 [ 4.086639] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM [ 4.093719] Modules linked in: [ 4.096954] CPU: 0 Not tainted (3.6.0-02232-g759e00b #570) [ 4.103149] PC is at vfp_reload_hw+0x1c/0x44 [ 4.107666] LR is at __und_usr_fault_32+0x0/0x8 It turns out that the context save/restore fix unmasked a latent bug in commit 5aaf254409f8d58229107b59507a8235b715a960 ("ARM: 6203/1: Make VFPv3 usable on ARMv6"). When CONFIG_VFPv3 is set, but the kernel is booted on a pre-VFPv3 core, the code attempts to save and restore the d16-d31 VFP registers. These are only present on non-D16 VFPv3+, so this results in an undefined instruction exception. The code didn't crash before commit 846a136 because the save and restore code was only touching d0-d15, present on all VFP. Fix by implementing a request from Russell King to add a new HWCAP flag that affirmatively indicates the presence of the d16-d31 registers: http://marc.info/?l=linux-arm-kernel&m=135013547905283&w=2 and some feedback from Måns to clarify the name of the HWCAP flag. Signed-off-by: Paul Walmsley <paul@pwsan.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Martin <dave.martin@linaro.org> Cc: Måns Rullgård <mans.rullgard@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Dove: Fix irq_to_pmu()Russell King - ARM Linux2012-12-101-1/+1
| | | | | | | | | | | | | | | commit d356cf5a74afa32b40decca3c9dd88bc3cd63eb5 upstream. PMU interrupts start at IRQ_DOVE_PMU_START, not IRQ_DOVE_PMU_START + 1. Fix the condition. (It may have been less likely to occur had the code been written "if (irq >= IRQ_DOVE_PMU_START" which imho is the easier to understand notation, and matches the normal way of thinking about these things.) Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Dove: Attempt to fix PMU/RTC interruptsRussell King - ARM Linux2012-12-101-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5d3df935426271016b895aecaa247101b4bfa35e upstream. Fix the acknowledgement of PMU interrupts on Dove: some Dove hardware has not been sensibly designed so that interrupts can be handled in a race free manner. The PMU is one such instance. The pending (aka 'cause') register is a bunch of RW bits, meaning that these bits can be both cleared and set by software (confirmed on the Armada-510 on the cubox.) Hardware sets the appropriate bit when an interrupt is asserted, and software is required to clear the bits which are to be processed. If we write ~(1 << bit), then we end up asserting every other interrupt except the one we're processing. So, we need to do a read-modify-write cycle to clear the asserted bit. However, any interrupts which occur in the middle of this cycle will also be written back as zero, which will also clear the new interrupts. The upshot of this is: there is _no_ way to safely clear down interrupts in this register (and other similarly behaving interrupt pending registers on this device.) The patch below at least stops us creating new interrupts. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86-32: Export kernel_stack_pointer() for modulesH. Peter Anvin2012-12-051-0/+2
| | | | | | | | | | | | | | | | | | commit cb57a2b4cff7edf2a4e32c0163200e9434807e0a upstream. Modules, in particular oprofile (and possibly other similar tools) need kernel_stack_pointer(), so export it using EXPORT_SYMBOL_GPL(). Cc: Yang Wei <wei.yang@windriver.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Jun Zhang <jun.zhang@intel.com> Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Robert Richter <rric@kernel.org> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Philip Müller <philm@manjaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86, mce, therm_throt: Don't report power limit and package level thermal ↵Fenghua Yu2012-12-031-22/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | throttle events in mcelog commit 29e9bf1841e4f9df13b4992a716fece7087dd237 upstream. Thermal throttle and power limit events are not defined as MCE errors in x86 architecture and should not generate MCE errors in mcelog. Current kernel generates fake software defined MCE errors for these events. This may confuse users because they may think the machine has real MCE errors while actually only thermal throttle or power limit events happen. To make it worse, buggy firmware on some platforms may falsely generate the events. Therefore, kernel reports MCE errors which users think as real hardware errors. Although the firmware bugs should be fixed, on the other hand, kernel should not report MCE errors either. So mcelog is not a good mechanism to report these events. To report the events, we count them in respective counters (core_power_limit_count, package_power_limit_count, core_throttle_count, and package_throttle_count) in /sys/devices/system/cpu/cpu#/thermal_throttle/. Users can check the counters for each event on each CPU. Please note that all CPU's on one package report duplicate counters. It's user application's responsibity to retrieve a package level counter for one package. This patch doesn't report package level power limit, core level power limit, and package level thermal throttle events in mcelog. When the events happen, only report them in respective counters in sysfs. Since core level thermal throttle has been legacy code in kernel for a while and users accepted it as MCE error in mcelog, core level thermal throttle is still reported in mcelog. In the mean time, the event is counted in a counter in sysfs as well. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Borislav Petkov <bp@amd64.org> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20111215001945.GA21009@linux-os.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sparc64: not any error from do_sigaltstack() should fail rt_sigreturn()Al Viro2012-12-031-3/+1
| | | | | | | | | | | | | | | | | | commit fae2ae2a900a5c7bb385fe4075f343e7e2d5daa2 upstream. If a signal handler is executed on altstack and another signal comes, we will end up with rt_sigreturn() on return from the second handler getting -EPERM from do_sigaltstack(). It's perfectly OK, since we are not asking to change the settings; in fact, they couldn't have been changed during the second handler execution exactly because we'd been on altstack all along. 64bit sigreturn on sparc treats any error from do_sigaltstack() as "SIGSEGV now"; we need to switch to the same semantics we are using on other architectures. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* PARISC: fix user-triggerable panic on pariscAl Viro2012-12-031-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 441a179dafc0f99fc8b3a8268eef66958621082e upstream. int sys32_rt_sigprocmask(int how, compat_sigset_t __user *set, compat_sigset_t __user *oset, unsigned int sigsetsize) { sigset_t old_set, new_set; int ret; if (set && get_sigset32(set, &new_set, sigsetsize)) ... static int get_sigset32(compat_sigset_t __user *up, sigset_t *set, size_t sz) { compat_sigset_t s; int r; if (sz != sizeof *set) panic("put_sigset32()"); In other words, rt_sigprocmask(69, (void *)69, 69) done by 32bit process will promptly panic the box. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* PARISC: fix virtual aliasing issue in get_shared_area()James Bottomley2012-12-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 949a05d03490e39e773e8652ccab9157e6f595b4 upstream. On Thu, 2012-11-01 at 16:45 -0700, Michel Lespinasse wrote: > Looking at the arch/parisc/kernel/sys_parisc.c implementation of > get_shared_area(), I do have a concern though. The function basically > ignores the pgoff argument, so that if one creates a shared mapping of > pages 0-N of a file, and then a separate shared mapping of pages 1-N > of that same file, both will have the same cache offset for their > starting address. > > This looks like this would create obvious aliasing issues. Am I > misreading this ? I can't understand how this could work good enough > to be undetected, so there must be something I'm missing here ??? This turns out to be correct and we need to pay attention to the pgoff as well as the address when creating the virtual address for the area. Fortunately, the bug is rarely triggered as most applications which use pgoff tend to use large values (git being the primary one, and it uses pgoff in multiples of 16MB) which are larger than our cache coherency modulus, so the problem isn't often seen in practise. Reported-by: Michel Lespinasse <walken@google.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86, microcode, AMD: Add support for family 16h processorsBoris Ostrovsky2012-12-031-0/+4
| | | | | | | | | | | | | | | commit 36c46ca4f322a7bf89aad5462a3a1f61713edce7 upstream. Add valid patch size for family 16h processors. [ hpa: promoting to urgent/stable since it is hw enabling and trivial ] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Acked-by: Andreas Herrmann <herrmann.der.user@googlemail.com> Link: http://lkml.kernel.org/r/1353004910-2204-1-git-send-email-boris.ostrovsky@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86-32: Fix invalid stack address while in softirqRobert Richter2012-12-032-11/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1022623842cb72ee4d0dbf02f6937f38c92c3f41 upstream. In 32 bit the stack address provided by kernel_stack_pointer() may point to an invalid range causing NULL pointer access or page faults while in NMI (see trace below). This happens if called in softirq context and if the stack is empty. The address at &regs->sp is then out of range. Fixing this by checking if regs and &regs->sp are in the same stack context. Otherwise return the previous stack pointer stored in struct thread_info. If that address is invalid too, return address of regs. BUG: unable to handle kernel NULL pointer dereference at 0000000a IP: [<c1004237>] print_context_stack+0x6e/0x8d *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: Pid: 4434, comm: perl Not tainted 3.6.0-rc3-oprofile-i386-standard-g4411a05 #4 Hewlett-Packard HP xw9400 Workstation/0A1Ch EIP: 0060:[<c1004237>] EFLAGS: 00010093 CPU: 0 EIP is at print_context_stack+0x6e/0x8d EAX: ffffe000 EBX: 0000000a ECX: f4435f94 EDX: 0000000a ESI: f4435f94 EDI: f4435f94 EBP: f5409ec0 ESP: f5409ea0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: 0000000a CR3: 34ac9000 CR4: 000007d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process perl (pid: 4434, ti=f5408000 task=f5637850 task.ti=f4434000) Stack: 000003e8 ffffe000 00001ffc f4e39b00 00000000 0000000a f4435f94 c155198c f5409ef0 c1003723 c155198c f5409f04 00000000 f5409edc 00000000 00000000 f5409ee8 f4435f94 f5409fc4 00000001 f5409f1c c12dce1c 00000000 c155198c Call Trace: [<c1003723>] dump_trace+0x7b/0xa1 [<c12dce1c>] x86_backtrace+0x40/0x88 [<c12db712>] ? oprofile_add_sample+0x56/0x84 [<c12db731>] oprofile_add_sample+0x75/0x84 [<c12ddb5b>] op_amd_check_ctrs+0x46/0x260 [<c12dd40d>] profile_exceptions_notify+0x23/0x4c [<c1395034>] nmi_handle+0x31/0x4a [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c13950ed>] do_nmi+0xa0/0x2ff [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c13949e5>] nmi_stack_correct+0x28/0x2d [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c1003603>] ? do_softirq+0x4b/0x7f <IRQ> [<c102a06f>] irq_exit+0x35/0x5b [<c1018f56>] smp_apic_timer_interrupt+0x6c/0x7a [<c1394746>] apic_timer_interrupt+0x2a/0x30 Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 10 73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 89 55 e0 e8 ad 42 03 00 85 c0 8b 55 e0 75 a6 eb cc EIP: [<c1004237>] print_context_stack+0x6e/0x8d SS:ESP 0068:f5409ea0 CR2: 000000000000000a ---[ end trace 62afee3481b00012 ]--- Kernel panic - not syncing: Fatal exception in interrupt V2: * add comments to kernel_stack_pointer() * always return a valid stack address by falling back to the address of regs Reported-by: Yang Wei <wei.yang@windriver.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jun Zhang <jun.zhang@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* kbuild: Fix gcc -x syntaxJean Delvare2012-11-262-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit b1e0d8b70fa31821ebca3965f2ef8619d7c5e316 upstream. The correct syntax for gcc -x is "gcc -x assembler", not "gcc -xassembler". Even though the latter happens to work, the former is what is documented in the manual page and thus what gcc wrappers such as icecream do expect. This isn't a cosmetic change. The missing space prevents icecream from recognizing compilation tasks it can't handle, leading to silent kernel miscompilations. Besides me, credits go to Michael Matz and Dirk Mueller for investigating the miscompilation issue and tracking it down to this incorrect -x parameter syntax. Signed-off-by: Jean Delvare <jdelvare@suse.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Bernhard Walle <bernhard@bwalle.de> Cc: Michal Marek <mmarek@suse.cz> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* m68k: fix sigset_t accessor functionsAndreas Schwab2012-11-261-3/+3
| | | | | | | | | | | | | | | | | commit 34fa78b59c52d1db3513db4c1a999db26b2e9ac2 upstream. The sigaddset/sigdelset/sigismember functions that are implemented with bitfield insn cannot allow the sigset argument to be placed in a data register since the sigset is wider than 32 bits. Remove the "d" constraint from the asm statements. The effect of the bug is that sending RT signals does not work, the signal number is truncated modulo 32. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/gup: add missing TASK_SIZE check to get_user_pages_fast()Heiko Carstens2012-11-261-1/+1
| | | | | | | | | | | | | | | | | commit d55c4c613fc4d4ad2ba0fc6fa2b57176d420f7e4 upstream. When walking page tables we need to make sure that everything is within bounds of the ASCE limit of the task's address space. Otherwise we might calculate e.g. a pud pointer which is not within a pud and dereference it. So check against TASK_SIZE (which is the ASCE limit) before walking page tables. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facilityLen Brown2012-11-052-31/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f6365201d8a21fb347260f89d6e9b3e718d63c70 upstream. The X86_32-only disable_hlt/enable_hlt mechanism was used by the 32-bit floppy driver. Its effect was to replace the use of the HLT instruction inside default_idle() with cpu_relax() - essentially it turned off the use of HLT. This workaround was commented in the code as: "disable hlt during certain critical i/o operations" "This halt magic was a workaround for ancient floppy DMA wreckage. It should be safe to remove." H. Peter Anvin additionally adds: "To the best of my knowledge, no-hlt only existed because of flaky power distributions on 386/486 systems which were sold to run DOS. Since DOS did no power management of any kind, including HLT, the power draw was fairly uniform; when exposed to the much hhigher noise levels you got when Linux used HLT caused some of these systems to fail. They were by far in the minority even back then." Alan Cox further says: "Also for the Cyrix 5510 which tended to go castors up if a HLT occurred during a DMA cycle and on a few other boxes HLT during DMA tended to go astray. Do we care ? I doubt it. The 5510 was pretty obscure, the 5520 fixed it, the 5530 is probably the oldest still in any kind of use." So, let's finally drop this. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Stephen Hemminger <shemminger@vyatta.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-3rhk9bzf0x9rljkv488tloib@git.kernel.org [ If anyone cares then alternative instruction patching could be used to replace HLT with a one-byte NOP instruction. Much simpler. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86, mm: Undo incorrect revert in arch/x86/mm/init.cYinghai Lu2012-10-311-4/+0
| | | | | | | | | | | | | | | | | | | | | commit f82f64dd9f485e13f29f369772d4a0e868e5633a upstream. Commit 844ab6f9 x86, mm: Find_early_table_space based on ranges that are actually being mapped added back some lines back wrongly that has been removed in commit 7b16bbf97 Revert "x86/mm: Fix the size calculation of mapping tables" remove them again. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/CAE9FiQW_vuaYQbmagVnxT2DGsYc=9tNeAbdBq53sYkitPOwxSQ@mail.gmail.com Acked-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86, mm: Find_early_table_space based on ranges that are actually being mappedJacob Shin2012-10-311-30/+43
| | | | | | | | | | | | | | | | | | | | | | commit 844ab6f993b1d32eb40512503d35ff6ad0c57030 upstream. Current logic finds enough space for direct mapping page tables from 0 to end. Instead, we only need to find enough space to cover mr[0].start to mr[nr_range].end -- the range that is actually being mapped by init_memory_mapping() This is needed after 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a, to address the panic reported here: https://lkml.org/lkml/2012/10/20/160 https://lkml.org/lkml/2012/10/21/157 Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/20121024195311.GB11779@jshin-Toonie Tested-by: Tom Rini <trini@ti.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: at91/i2c: change id to let i2c-gpio workBo Shen2012-10-315-5/+5
| | | | | | | | | | | | | | | | | | | | | commit 7840487cd6298f9f931103b558290d8d98d41c49 upstream. The i2c core driver will turn the platform device ID to busnum When using platfrom device ID as -1, it means dynamically assigned the busnum. When writing code, we need to make sure the busnum, and call i2c_register_board_info(int busnum, ...) to register device if using -1, we do not know the value of busnum In order to solve this issue, set the platform device ID as a fix number Here using 0 to match the busnum used in i2c_regsiter_board_info() Signed-off-by: Bo Shen <voice.shen@atmel.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: 7559/1: smp: switch away from the idmap before updating init_mm.mm_countWill Deacon2012-10-311-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5f40b909728ad784eb43aa309d3c4e9bdf050781 upstream. When booting a secondary CPU, the primary CPU hands two sets of page tables via the secondary_data struct: (1) swapper_pg_dir: a normal, cacheable, shared (if SMP) mapping of the kernel image (i.e. the tables used by init_mm). (2) idmap_pgd: an uncached mapping of the .idmap.text ELF section. The idmap is generally used when enabling and disabling the MMU, which includes early CPU boot. In this case, the secondary CPU switches to swapper as soon as it enters C code: struct mm_struct *mm = &init_mm; unsigned int cpu = smp_processor_id(); /* * All kernel threads share the same mm context; grab a * reference and switch to it. */ atomic_inc(&mm->mm_count); current->active_mm = mm; cpumask_set_cpu(cpu, mm_cpumask(mm)); cpu_switch_mm(mm->pgd, mm); This causes a problem on ARMv7, where the identity mapping is treated as strongly-ordered leading to architecturally UNPREDICTABLE behaviour of exclusive accesses, such as those used by atomic_inc. This patch re-orders the secondary_start_kernel function so that we switch to swapper before performing any exclusive accesses. Reported-by: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> Cc: David McKay <david.mckay@st.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sparc64: Be less verbose during vmemmap population.David S. Miller2012-10-281-5/+23
| | | | | | | | | | | | | | | | | | | [ Upstream commit 2856cc2e4d0852c3ddaae9dcb19cb9396512eb08 ] On a 2-node machine with 256GB of ram we get 512 lines of console output, which is just too much. This mimicks Yinghai Lu's x86 commit c2b91e2eec9678dbda274e906cc32ea8f711da3b (x86_64/mm: check and print vmemmap allocation continuous) except that we aren't ever going to get contiguous block pointers in between calls so just print when the virtual address or node changes. This decreases the output by an order of 16. Also demote this to KERN_DEBUG. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>