summaryrefslogtreecommitdiffstats
path: root/arch/x86/mm/pat.c
Commit message (Collapse)AuthorAgeFilesLines
* x86/io: add interface to reserve io memtype for a resource range. (v1.1)Dave Airlie2018-09-091-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8ef4227615e158faa4ee85a1d6466782f7e22f2f upstream. A recent change to the mm code in: 87744ab3832b mm: fix cache mode tracking in vm_insert_mixed() started enforcing checking the memory type against the registered list for amixed pfn insertion mappings. It happens that the drm drivers for a number of gpus relied on this being broken. Currently the driver only inserted VRAM mappings into the tracking table when they came from the kernel, and userspace mappings never landed in the table. This led to a regression where all the mapping end up as UC instead of WC now. I've considered a number of solutions but since this needs to be fixed in fixes and not next, and some of the solutions were going to introduce overhead that hadn't been there before I didn't consider them viable at this stage. These mainly concerned hooking into the TTM io reserve APIs, but these API have a bunch of fast paths I didn't want to unwind to add this to. The solution I've decided on is to add a new API like the arch_phys_wc APIs (these would have worked but wc_del didn't take a range), and use them from the drivers to add a WC compatible mapping to the table for all VRAM on those GPUs. This means we can then create userspace mapping that won't get degraded to UC. v1.1: use CONFIG_X86_PAT + add some comments in io.h Cc: Toshi Kani <toshi.kani@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: x86@kernel.org Cc: mcgrof@suse.com Cc: Dan Williams <dan.j.williams@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/mm/pat, /dev/mem: Remove superfluous error messageJiri Kosina2018-01-171-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 39380b80d72723282f0ea1d1bbf2294eae45013e upstream. Currently it's possible for broken (or malicious) userspace to flood a kernel log indefinitely with messages a-la Program dmidecode tried to access /dev/mem between f0000->100000 because range_is_allowed() is case of CONFIG_STRICT_DEVMEM being turned on dumps this information each and every time devmem_is_allowed() fails. Reportedly userspace that is able to trigger contignuous flow of these messages exists. It would be possible to rate limit this message, but that'd have a questionable value; the administrator wouldn't get information about all the failing accessess, so then the information would be both superfluous and incomplete at the same time :) Returning EPERM (which is what is actually happening) is enough indication for userspace what has happened; no need to log this particular error as some sort of special condition. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1607081137020.24757@cbobk.fhfr.pm Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/mm/pat: Don't report PAT on CPUs that don't support itMikulas Patocka2017-07-151-16/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 99c13b8c8896d7bcb92753bf0c63a8de4326e78d upstream. The pat_enabled() logic is broken on CPUs which do not support PAT and where the initialization code fails to call pat_init(). Due to that the enabled flag stays true and pat_enabled() returns true wrongfully. As a consequence the mappings, e.g. for Xorg, are set up with the wrong caching mode and the required MTRR setups are omitted. To cure this the following changes are required: 1) Make pat_enabled() return true only if PAT initialization was invoked and successful. 2) Invoke init_cache_modes() unconditionally in setup_arch() and remove the extra callsites in pat_disable() and the pat disabled code path in pat_init(). Also rename __pat_enabled to pat_disabled to reflect the real purpose of this variable. Fixes: 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is disabled") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bernhard Held <berny156@gmx.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: "Luis R. Rodriguez" <mcgrof@suse.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707041749300.3456@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/mm/pat: Fix BUG_ON() in mmap_mem() on QEMU/i386Toshi Kani2016-08-161-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1886297ce0c8d563a08c8a8c4c0b97743e06cd37 upstream. The following BUG_ON() crash was reported on QEMU/i386: kernel BUG at arch/x86/mm/physaddr.c:79! Call Trace: phys_mem_access_prot_allowed mmap_mem ? mmap_region mmap_region do_mmap vm_mmap_pgoff SyS_mmap_pgoff do_int80_syscall_32 entry_INT80_32 after commit: edfe63ec97ed ("x86/mtrr: Fix Xorg crashes in Qemu sessions") PAT is now set to disabled state when MTRRs are disabled. Thus, reactivating the __pa(high_memory) check in phys_mem_access_prot_allowed(). When CONFIG_DEBUG_VIRTUAL is set, __pa() calls __phys_addr(), which in turn calls slow_virt_to_phys() for 'high_memory'. Because 'high_memory' is set to (the max direct mapped virt addr + 1), it is not a valid virtual address. Hence, slow_virt_to_phys() returns 0 and hit the BUG_ON. Using __pa_nodebug() instead of __pa() will fix this BUG_ON. However, this code block, originally written for Pentiums and earlier, is no longer adequate since a 32-bit Xen guest has MTRRs disabled and supports ZONE_HIGHMEM. In this setup, this code sets UC attribute for accessing RAM in high memory range. Delete this code block as it has been unused for a long time. Reported-by: kernel test robot <ying.huang@linux.intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1460403360-25441-1-git-send-email-toshi.kani@hpe.com Link: https://lkml.org/lkml/2016/4/1/608 Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/xen, pat: Remove PAT table init code from XenToshi Kani2016-08-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 88ba281108ed0c25c9d292b48bd3f272fcb90dd0 upstream. Xen supports PAT without MTRRs for its guests. In order to enable WC attribute, it was necessary for xen_start_kernel() to call pat_init_cache_modes() to update PAT table before starting guest kernel. Now that the kernel initializes PAT table to the BIOS handoff state when MTRR is disabled, this Xen-specific PAT init code is no longer necessary. Delete it from xen_start_kernel(). Also change __init_cache_modes() to a static function since PAT table should not be tweaked by other modules. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Toshi Kani <toshi.kani@hp.com> Cc: elliott@hpe.com Cc: paul.gortmaker@windriver.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1458769323-24491-7-git-send-email-toshi.kani@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/mm/pat: Replace cpu_has_pat with boot_cpu_has()Toshi Kani2016-08-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d63dcf49cf5ae5605f4d14229e3888e104f294b1 upstream. Borislav Petkov suggested: > Please use on init paths boot_cpu_has(X86_FEATURE_PAT) and on fast > paths static_cpu_has(X86_FEATURE_PAT). No more of that cpu_has_XXX > ugliness. Replace the use of cpu_has_pat on init paths with boot_cpu_has(). Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: konrad.wilk@oracle.com Cc: paul.gortmaker@windriver.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1458769323-24491-4-git-send-email-toshi.kani@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/mm/pat: Add pat_disable() interfaceToshi Kani2016-08-161-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 224bb1e5d67ba0f2872c98002d6a6f991ac6fd4a upstream. In preparation for fixing a regression caused by: 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is disabled") ... PAT needs to provide an interface that prevents the OS from initializing the PAT MSR. PAT MSR initialization must be done on all CPUs using the specific sequence of operations defined in the Intel SDM. This requires MTRRs to be enabled since pat_init() is called as part of MTRR init from mtrr_rendezvous_handler(). Make pat_disable() as the interface that prevents the OS from initializing the PAT MSR. MTRR will call this interface when it cannot provide the SDM-defined sequence to initialize PAT. This also assures that pat_disable() called from pat_bsp_init() will set the PAT table properly when CPU does not support PAT. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: konrad.wilk@oracle.com Cc: paul.gortmaker@windriver.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1458769323-24491-3-git-send-email-toshi.kani@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/mm/pat: Add support of non-default PAT MSR settingToshi Kani2016-08-161-20/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 02f037d641dc6672be5cfe7875a48ab99b95b154 upstream. In preparation for fixing a regression caused by: 9cd25aac1f44 ("x86/mm/pat: Emulate PAT when it is disabled")' ... PAT needs to support a case that PAT MSR is initialized with a non-default value. When pat_init() is called and PAT is disabled, it initializes the PAT table with the BIOS default value. Xen, however, sets PAT MSR with a non-default value to enable WC. This causes inconsistency between the PAT table and PAT MSR when PAT is set to disable on Xen. Change pat_init() to handle the PAT disable cases properly. Add init_cache_modes() to handle two cases when PAT is set to disable. 1. CPU supports PAT: Set PAT table to be consistent with PAT MSR. 2. CPU does not support PAT: Set PAT table to be consistent with PWT and PCD bits in a PTE. Note, __init_cache_modes(), renamed from pat_init_cache_modes(), will be changed to a static function in a later patch. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Toshi Kani <toshi.kani@hp.com> Cc: elliott@hpe.com Cc: konrad.wilk@oracle.com Cc: paul.gortmaker@windriver.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1458769323-24491-2-git-send-email-toshi.kani@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/mm/pat: Extend set_page_memtype() to support Write-Through typeToshi Kani2015-06-071-30/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As set_memory_wb() calls free_ram_pages_type(), which then calls set_page_memtype() with -1, _PGMT_DEFAULT is used for tracking the WB type. _PGMT_WB is defined but unused. Thus, rename _PGMT_DEFAULT to _PGMT_WB to clarify the usage, and release the slot used by _PGMT_WB. Furthermore, change free_ram_pages_type() to call set_page_memtype() with _PGMT_WB, and get_page_memtype() to return _PAGE_CACHE_MODE_WB for _PGMT_WB. Then, define _PGMT_WT in the freed slot. This allows set_page_memtype() to track the WT type. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-12-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/pat: Add pgprot_writethrough()Toshi Kani2015-06-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add pgprot_writethrough() for setting page protection flags to Write-Through mode. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-11-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/pat: Change reserve_memtype() for Write-Through typeToshi Kani2015-06-071-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a target range is in RAM, reserve_ram_pages_type() verifies the requested type. Change it to fail WT and WP requests with -EINVAL since set_page_memtype() is limited to handle three types: WB, WC and UC-. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-6-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/pat: Use 7th PAT MSR slot for Write-Through PAT typeToshi Kani2015-06-071-9/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assign Write-Through type to the PA7 slot in the PAT MSR when the processor is not affected by PAT errata. The PA7 slot is chosen to improve robustness in the presence of errata that might cause the high PAT bit to be ignored. This way a buggy PA7 slot access will hit the PA3 slot, which is UC, so at worst we lose performance without causing a correctness issue. The following Intel processors are affected by the PAT errata. Errata CPUID ---------------------------------------------------- Pentium 2, A52 family 0x6, model 0x5 Pentium 3, E27 family 0x6, model 0x7, 0x8 Pentium 3 Xenon, G26 family 0x6, model 0x7, 0x8, 0xa Pentium M, Y26 family 0x6, model 0x9 Pentium M 90nm, X9 family 0x6, model 0xd Pentium 4, N46 family 0xf, model 0x0 Instead of making sharp boundary checks, we remain conservative and exclude all Pentium 2, 3, M and 4 family processors. For those, _PAGE_CACHE_MODE_WT is redirected to UC- per the default setup in __cachemode2pte_tbl[]. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: https://lkml.kernel.org/r/1433187393-22688-2-git-send-email-toshi.kani@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/pat: Remove pat_enabled() checksBorislav Petkov2015-06-071-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we emulate a PAT table when PAT is disabled, there's no need for those checks anymore as the PAT abstraction will handle those cases too. Based on a conglomerate patch from Toshi Kani. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/pat: Emulate PAT when it is disabledBorislav Petkov2015-06-071-28/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case when PAT is disabled on the command line with "nopat" or when virtualization doesn't support PAT (correctly) - see 9d34cfdf4796 ("x86: Don't rely on VMWare emulating PAT MSR correctly"). we emulate it using the PWT and PCD cache attribute bits. Get rid of boot_pat_state while at it. Based on a conglomerate patch from Toshi Kani. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/pat: Untangle pat_init()Borislav Petkov2015-06-071-29/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | Split it into a BSP and AP version which makes the PAT initialization path actually readable again. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/pat: Export pat_enabled()Luis R. Rodriguez2015-05-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two Linux device drivers cannot work with PAT and the work required to make them work is significant. There is not enough motivation to convert these drivers over to use PAT properly, the compromise reached is to let drivers that cannot be ported to PAT check if PAT was enabled and if so fail on probe with a recommendation to boot with the "nopat" kernel parameter. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Walls <awalls@md.metrocast.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Doug Ledford <dledford@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430425520-22275-4-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-14-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/pat: Wrap pat_enabled into a function APILuis R. Rodriguez2015-05-271-18/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use pat_enabled in x86-specific code to see if PAT is enabled or not but we're granting full access to it even though readers do not need to set it. If, for instance, we granted access to it to modules later they then could override the variable setting... no bueno. This renames pat_enabled to a new static variable __pat_enabled. Folks are redirected to use pat_enabled() now. Code that sets this can only be internal to pat.c. Apart from the early kernel parameter "nopat" to disable PAT, we also have a few cases that disable it later and make use of a helper pat_disable(). It is wrapped under an ifdef but since that code cannot run unless PAT was enabled its not required to wrap it with ifdefs, unwrap that. Likewise, since "nopat" doesn't really change non-PAT systems just remove that ifdef as well. Although we could add and use an early_param_off(), these helpers don't use __read_mostly but we want to keep __read_mostly for __pat_enabled as this is a hot path -- upon boot, for instance, a simple guest may see ~4k accesses to pat_enabled(). Since __read_mostly early boot params are not that common we don't add a helper for them just yet. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Walls <awalls@md.metrocast.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Doug Ledford <dledford@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kyle McMartin <kyle@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-13-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/pat: Convert to pr_*() usageLuis R. Rodriguez2015-05-271-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use pr_info() instead of the old printk to prefix the component where things are coming from. With this readers will know exactly where the message is coming from. We use pr_* helpers but define pr_fmt to the empty string for easier grepping for those error messages. We leave the users of dprintk() in place, this will print only when the debugpat kernel parameter is enabled. We want to leave those enabled as a debug feature, but also make them use the same prefix. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> [ Kill pr_fmt. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Walls <awalls@md.metrocast.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Doug Ledford <dledford@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: cocci@systeme.lip6.fr Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Link: http://lkml.kernel.org/r/1430425520-22275-2-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-9-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpersToshi Kani2015-05-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the argument 'uniform' to mtrr_type_lookup(), which gets set to 1 when a given range is covered uniformly by MTRRs, i.e. the range is fully covered by a single MTRR entry or the default type. Change pud_set_huge() and pmd_set_huge() to honor the 'uniform' flag to see if it is safe to create a huge page mapping in the range. This allows them to create a huge page mapping in a range covered by a single MTRR entry of any memory type. It also detects a non-optimal request properly. They continue to check with the WB type since it does not effectively change the uniform mapping even if a request spans multiple MTRR entries. pmd_set_huge() logs a warning message to a non-optimal request so that driver writers will be aware of such a case. Drivers should make a mapping request aligned to a single MTRR entry when the range is covered by MTRRs. Signed-off-by: Toshi Kani <toshi.kani@hp.com> [ Realign, flesh out comments, improve warning message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-8-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/pat: Ensure different messages in STRICT_DEVMEM and PAT casesPavel Machek2015-02-191-3/+3
| | | | | | | | | | | STRICT_DEVMEM and PAT produce same failure accessing /dev/mem, which is quite confusing to the user. Make printk messages different to lessen confusion. Signed-off-by: Pavel Machek <pavel@ucw.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86: Don't rely on VMWare emulating PAT MSR correctlyJuergen Gross2015-01-201-1/+6
| | | | | | | | | | | | | | | | | VMWare seems not to emulate the PAT MSR correctly: reaeding MSR_IA32_CR_PAT returns 0 even after writing another value to it. Commit bd809af16e3ab triggers this VMWare bug when the kernel is booted as a VMWare guest. Detect this bug and don't use the read value if it is 0. Fixes: bd809af16e3ab "x86: Enable PAT to use cache mode translation tables" Reported-and-tested-by: Jongman Heo <jongman.heo@samsung.com> Acked-by: Alok N Kataria <akataria@vmware.com> Signed-off-by: Juergen Gross <jgross@suse.com> Link: http://lkml.kernel.org/r/1421039745-14335-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2014-12-101-63/+182
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm tree changes from Ingo Molnar: "The biggest change is full PAT support from Jürgen Gross: The x86 architecture offers via the PAT (Page Attribute Table) a way to specify different caching modes in page table entries. The PAT MSR contains 8 entries each specifying one of 6 possible cache modes. A pte references one of those entries via 3 bits: _PAGE_PAT, _PAGE_PWT and _PAGE_PCD. The Linux kernel currently supports only 4 different cache modes. The PAT MSR is set up in a way that the setting of _PAGE_PAT in a pte doesn't matter: the top 4 entries in the PAT MSR are the same as the 4 lower entries. This results in the kernel not supporting e.g. write-through mode. Especially this cache mode would speed up drivers of video cards which now have to use uncached accesses. OTOH some old processors (Pentium) don't support PAT correctly and the Xen hypervisor has been using a different PAT MSR configuration for some time now and can't change that as this setting is part of the ABI. This patch set abstracts the cache mode from the pte and introduces tables to translate between cache mode and pte bits (the default cache mode "write back" is hard-wired to PAT entry 0). The tables are statically initialized with values being compatible to old processors and current usage. As soon as the PAT MSR is changed (or - in case of Xen - is read at boot time) the tables are changed accordingly. Requests of mappings with special cache modes are always possible now, in case they are not supported there will be a fallback to a compatible but slower mode. Summing it up, this patch set adds the following features: - capability to support WT and WP cache modes on processors with full PAT support - processors with no or uncorrect PAT support are still working as today, even if WT or WP cache mode are selected by drivers for some pages - reduction of Xen special handling regarding cache mode Another change is a boot speedup on ridiculously large RAM systems, plus other smaller fixes" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86: mm: Move PAT only functions to mm/pat.c xen: Support Xen pv-domains using PAT x86: Enable PAT to use cache mode translation tables x86: Respect PAT bit when copying pte values between large and normal pages x86: Support PAT bit in pagetable dump for lower levels x86: Clean up pgtable_types.h x86: Use new cache mode type in memtype related functions x86: Use new cache mode type in mm/ioremap.c x86: Use new cache mode type in setting page attributes x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert() x86: Use new cache mode type in mm/iomap_32.c x86: Use new cache mode type in asm/pgtable.h x86: Use new cache mode type in arch/x86/mm/init_64.c x86: Use new cache mode type in arch/x86/pci x86: Use new cache mode type in drivers/video/fbdev/vermilion x86: Use new cache mode type in drivers/video/fbdev/gbefb.c x86: Use new cache mode type in include/asm/fb.h x86: Make page cache mode a real type x86: mm: Use 2GB memory block size on large-memory x86-64 systems ...
| * x86: mm: Move PAT only functions to mm/pat.cThomas Gleixner2014-11-161-0/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e00c8cc93c1a "x86: Use new cache mode type in memtype related functions" broke the ARCH=um build. arch/x86/include/asm/cacheflush.h:67:36: error: return type is an incomplete type static inline enum page_cache_mode get_page_memtype(struct page *pg) The reason is simple. get_page_memtype() and set_page_memtype() require enum page_cache_mode now, which is defined in asm/pgtable_types.h. UM does not include that file for obvious reasons. The simple solution is to move that functions to arch/x86/mm/pat.c where the only callsites of this are located. They should have been there in the first place. Fixes: e00c8cc93c1a "x86: Use new cache mode type in memtype related functions" Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Weinberger <richard@nod.at>
| * x86: Enable PAT to use cache mode translation tablesJuergen Gross2014-11-161-2/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the translation tables from cache mode to pgprot values according to the PAT settings. This enables changing the cache attributes of a PAT index in just one place without having to change at the users side. With this change it is possible to use the same kernel with different PAT configurations, e.g. supporting Xen. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-18-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Use new cache mode type in memtype related functionsJuergen Gross2014-11-161-53/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of directly using the cache mode bits in the pte switch to using the cache mode type. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-14-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Use new cache mode type in mm/ioremap.cJuergen Gross2014-11-161-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of directly using the cache mode bits in the pte switch to using the cache mode type. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-13-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()Juergen Gross2014-11-161-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of directly using the cache mode bits in the pte switch to using the cache mode type. As those are the main callers of lookup_memtype(), change this as well. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-10-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Use new cache mode type in mm/iomap_32.cJuergen Gross2014-11-161-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of directly using the cache mode bits in the pte switch to using the cache mode type. This requires to change io_reserve_memtype() as well. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-9-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86: Use new cache mode type in asm/pgtable.hJuergen Gross2014-11-161-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of directly using the cache mode bits in the pte switch to using the cache mode type. This requires changing some callers of is_new_memtype_allowed() to be changed as well. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-8-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | x86: Replace seq_printf() with seq_puts()Rasmus Villemoes2014-12-081-1/+1
|/ | | | | | | | | seq_puts is a lot cheaper than seq_printf, so use that to print literal strings. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: http://lkml.kernel.org/r/1417208622-12264-1-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86: Do not try to sync identity map for non-mapped pagesDave Hansen2013-03-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | kernel_map_sync_memtype() is called from a variety of contexts. The pat.c code that calls it seems to ensure that it is not called for non-ram areas by checking via pat_pagerange_is_ram(). It is important that it only be called on the actual identity map because there *IS* no map to sync for highmem pages, or for memory holes. The ioremap.c uses are not as careful as those from pat.c, and call kernel_map_sync_memtype() on PCI space which is in the middle of the kernel identity map _range_, but is not actually mapped. This patch adds a check to kernel_map_sync_memtype() which probably duplicates some of the checks already in pat.c. But, it is necessary for the ioremap.c uses and shouldn't hurt other callers. I have reproduced this bug and this patch fixes it for me and the original bug reporter: https://lkml.org/lkml/2013/2/5/396 Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130307163151.D9B58C4E@kernel.stglabs.ibm.com Signed-off-by: Dave Hansen <dave@sr71.net> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* x86, mm: Make DEBUG_VIRTUAL work earlier in bootDave Hansen2013-01-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The KVM code has some repeated bugs in it around use of __pa() on per-cpu data. Those data are not in an area on which using __pa() is valid. However, they are also called early enough in boot that __vmalloc_start_set is not set, and thus the CONFIG_DEBUG_VIRTUAL debugging does not catch them. This adds a check to also verify __pa() calls against max_low_pfn, which we can use earler in boot than is_vmalloc_addr(). However, if we are super-early in boot, max_low_pfn=0 and this will trip on every call, so also make sure that max_low_pfn is set before we try to use it. With this patch applied, CONFIG_DEBUG_VIRTUAL will actually catch the bug I was chasing (and fix later in this series). I'd love to find a generic way so that any __pa() call on percpu areas could do a BUG_ON(), but there don't appear to be any nice and easy ways to check if an address is a percpu one. Anybody have ideas on a way to do this? Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130122212430.F46F8159@kernel.stglabs.ibm.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* mm, x86, pat: rework linear pfn-mmap trackingKonstantin Khlebnikov2012-10-091-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the generic vma-flag VM_PFN_AT_MMAP with x86-only VM_PAT. We can toss mapping address from remap_pfn_range() into track_pfn_vma_new(), and collect all PAT-related logic together in arch/x86/. This patch also restores orignal frustration-free is_cow_mapping() check in remap_pfn_range(), as it was before commit v2.6.28-rc8-88-g3c8bb73 ("x86: PAT: store vm_pgoff for all linear_over_vma_region mappings - v3") is_linear_pfn_mapping() checks can be removed from mm/huge_memory.c, because it already handled by VM_PFNMAP in VM_NO_THP bit-mask. [suresh.b.siddha@intel.com: Reset the VM_PAT flag as part of untrack_pfn_vma()] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86, pat: separate the pfn attribute tracking for remap_pfn_range and ↵Suresh Siddha2012-10-091-11/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vm_insert_pfn With PAT enabled, vm_insert_pfn() looks up the existing pfn memory attribute and uses it. Expectation is that the driver reserves the memory attributes for the pfn before calling vm_insert_pfn(). remap_pfn_range() (when called for the whole vma) will setup a new attribute (based on the prot argument) for the specified pfn range. This addresses the legacy usage which typically calls remap_pfn_range() with a desired memory attribute. For ranges smaller than the vma size (which is typically not the case), remap_pfn_range() will use the existing memory attribute for the pfn range. Expose two different API's for these different behaviors. track_pfn_insert() for tracking the pfn attribute set by vm_insert_pfn() and track_pfn_remap() for the remap_pfn_range(). This cleanup also prepares the ground for the track/untrack pfn vma routines to take over the ownership of setting PAT specific vm_flag in the 'vma'. [khlebnikov@openvz.org: Clear checks in track_pfn_remap()] [akpm@linux-foundation.org: tweak a few comments] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Venkatesh Pallipadi <venki@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routinesSuresh Siddha2012-10-091-14/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'pfn' argument for track_pfn_vma_new() can be used for reserving the attribute for the pfn range. No need to depend on 'vm_pgoff' Similarly, untrack_pfn_vma() can depend on the 'pfn' argument if it is non-zero or can use follow_phys() to get the starting value of the pfn range. Also the non zero 'size' argument can be used instead of recomputing it from vma. This cleanup also prepares the ground for the track/untrack pfn vma routines to take over the ownership of setting PAT specific vm_flag in the 'vma'. [khlebnikov@openvz.org: Clear pfn to paddr conversion] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Venkatesh Pallipadi <venki@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'x86/trampoline' into x86/urgentH. Peter Anvin2012-05-301-23/+19
|\ | | | | | | | | | | | | x86/trampoline contains an urgent commit which is necessarily on a newer baseline. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86: print physical addresses consistently with other parts of kernelBjorn Helgaas2012-05-291-23/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Print physical address info in a style consistent with the %pR style used elsewhere in the kernel. For example: -found SMP MP-table at [ffff8800000fce90] fce90 +found SMP MP-table at [mem 0x000fce90-0x000fce9f] mapped at [ffff8800000fce90] -initial memory mapped : 0 - 20000000 +initial memory mapped: [mem 0x00000000-0x1fffffff] -Base memory trampoline at [ffff88000009c000] 9c000 size 8192 +Base memory trampoline [mem 0x0009c000-0x0009dfff] mapped at [ffff88000009c000] -SRAT: Node 0 PXM 0 0-80000000 +SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | x86/mm/pat: Improve scaling of pat_pagerange_is_ram()John Dykstra2012-05-301-20/+36
|/ | | | | | | | | | | | | | | | | | | | | Function pat_pagerange_is_ram() scales poorly to large address ranges, because it probes the resource tree for each page. On a 2.6 GHz Opteron, this function consumes 34 ms for a 1 GB range. It is called twice during untrack_pfn_vma(), slowing process cleanup and handicapping the OOM killer. This replacement consumes less than 1ms, under the same conditions. Signed-off-by: John Dykstra <jdykstra@cray.com> on behalf of Cray Inc. Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337980366.1979.6.camel@redwood [ Small stylistic cleanups and renames ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2010-08-061-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Ioremap: fix wrong physical address handling in PAT code x86, tlb: Clean up and correct used type x86, iomap: Fix wrong page aligned size calculation in ioremapping code x86, mm: Create symbolic index into address_markers array x86, ioremap: Fix normal ram range check x86, ioremap: Fix incorrect physical address handling in PAE mode x86-64, mm: Initialize VDSO earlier on 64 bits x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages
| * x86: Ioremap: fix wrong physical address handling in PAT codeYasuaki Ishimatsu2010-07-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following two commits fixed a problem that x86 ioremap() doesn't handle physical address higher than 32-bit properly in X86_32 PAE mode. ffa71f33a820d1ab3f2fc5723819ac60fb76080b (x86, ioremap: Fix incorrect physical address handling in PAE mode) 35be1b716a475717611b2dc04185e9d80b9cb693 (x86, ioremap: Fix normal ram range check) But these fixes are not enough, since pat_pagerange_is_ram() in PAT code also has a same problem. This patch fixes it. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <4C47DDCF.80300@jp.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | x86, pat: Proper init of memtype subtree_max_endVenkatesh Pallipadi2010-06-111-1/+1
|/ | | | | | | | | | | | | | | | | | | subtree_max_end that was recently added to struct memtype was not getting properly initialized resulting in WARNING: kmemcheck: Caught 64-bit read from uninitialized memory in memtype_rb_augment_cb() reported here https://bugzilla.kernel.org/show_bug.cgi?id=16092 This change fixes the problem. Reported-by: Christian Casteyde <casteyde.christian@free.fr> Tested-by: Christian Casteyde <casteyde.christian@free.fr> Signed-off-by: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <1276217101-11515-1-git-send-email-venki@google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
* x86, pat: Fix memory leak in free_memtypeXiaotian Feng2010-05-261-3/+7
| | | | | | | | | | | | | | | | | | | | | | Reserve_memtype will allocate memory for new memtype, but in free_memtype, after the memtype erased from rbtree, the memory is not freed. Changes since V1: make rbt_memtype_erase return erased memtype so that it can be freed in free_memtype. [ hpa: not for -stable: 2.6.34 and earlier not affected ] Signed-off-by: Xiaotian Feng <dfeng@redhat.com> LKML-Reference: <1274838670-8731-1-git-send-email-dfeng@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Jack Steiner <steiner@sgi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* Merge branch 'x86-pat-for-linus' of ↵Linus Torvalds2010-05-181-220/+19
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: Update the page flags for memtype atomically instead of using memtype_lock x86, pat: In rbt_memtype_check_insert(), update new->type only if valid x86, pat: Migrate to rbtree only backend for pat memtype management x86, pat: Preparatory changes in pat.c for bigger rbtree change rbtree: Add support for augmented rbtrees
| * x86, pat: Update the page flags for memtype atomically instead of using ↵Robin Holt2010-04-231-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | memtype_lock While testing an application using the xpmem (out of kernel) driver, we noticed a significant page fault rate reduction of x86_64 with respect to ia64. For one test running with 32 cpus, one thread per cpu, it took 01:08 for each of the threads to vm_insert_pfn 2GB worth of pages. For the same test running on 256 cpus, one thread per cpu, it took 14:48 to vm_insert_pfn 2 GB worth of pages. The slowdown was tracked to lookup_memtype which acquires the spinlock memtype_lock. This heavily contended lock was slowing down vm_insert_pfn(). With the cmpxchg on page->flags method, both the 32 cpu and 256 cpu cases take approx 00:01.3 seconds to complete. Signed-off-by: Robin Holt <holt@sgi.com> LKML-Reference: <20100423153627.751194346@gulag1.americas.sgi.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com> Cc: Rafael Wysocki <rjw@novell.com> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, pat: Migrate to rbtree only backend for pat memtype managementPallipadi, Venkatesh2010-02-181-204/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move pat backend to fully rbtree based implementation from the existing rbtree and linked list hybrid. New rbtree based solution uses interval trees (augmented rbtrees) in order to store the PAT ranges. The new code seprates out the pat backend to pat_rbtree.c file, making is cleaner. The change also makes the PAT lookup, reserve and free operations more optimal, as we don't have to traverse linear linked list of few tens of entries in normal case. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20100210232607.GB11465@linux-os.sc.intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, pat: Preparatory changes in pat.c for bigger rbtree changevenkatesh.pallipadi@intel.com2010-02-181-82/+88
| | | | | | | | | | | | | | | | | | | | | | Minor changes in pat.c to cleanup code and make it smoother to introduce bigger rbtree only change in the following patch. The changes are cleaup only and should not have any functional impact. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20100210195909.792781000@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* vfs: Implement proper O_SYNC semanticsChristoph Hellwig2009-12-101-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While Linux provided an O_SYNC flag basically since day 1, it took until Linux 2.4.0-test12pre2 to actually get it implemented for filesystems, since that day we had generic_osync_around with only minor changes and the great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC" comment. This patch intends to actually give us real O_SYNC semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC patches which are required before this patch it's actually surprisingly simple, we just need to figure out when to set the datasync flag to vfs_fsync_range and when not. This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's numerical value to keep binary compatibility, and adds a new real O_SYNC flag. To guarantee backwards compatiblity it is defined as expanding to both the O_DSYNC and the new additional binary flag (__O_SYNC) to make sure we are backwards-compatible when compiled against the new headers. This also means that all places that don't care about the differences can just check O_DSYNC and get the right behaviour for O_SYNC, too - only places that actuall care need to check __O_SYNC in addition. Drivers and network filesystems have been updated in a fail safe way to always do the full sync magic if O_DSYNC is set. The few places setting O_SYNC for lower layers are kept that way for now to stay failsafe. We enforce that O_DSYNC is set when __O_SYNC is set early in the open path to make sure we always get these sane options. Note that parisc really screwed up their headers as they already define a O_DSYNC that has always been a no-op. We try to repair it by using it for the new O_DSYNC and redefinining O_SYNC to send both the traditional O_SYNC numerical value _and_ the O_DSYNC one. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>
* Merge branch 'x86-pat-for-linus' of ↵Linus Torvalds2009-12-081-6/+1
|\ | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: pat: Remove ioremap_default() x86: pat: Clean up req_type special case for reserve_memtype() x86: Relegate CONFIG_PAT and CONFIG_MTRR configurability to EMBEDDED
| * x86: pat: Clean up req_type special case for reserve_memtype()Xiaotian Feng2009-11-101-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: b6ff32d: x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype() consolidated code in pat_x_mtrr_type() and reserve_memtype(), which removed the special case (req_type is -1) for the PAT-enabled part. We should also change comments and the PAT-disabled part. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <1257844987-7906-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>