summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* xen/kbdif: add multi-touch supportOleksandr Andrushchenko2017-05-021-0/+210
| | | | | | | | | | Multi-touch fields re-use the page that is used by the other features which means that you can interleave multi-touch, motion, and key events. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* xen/kbdif: update protocol descriptionOleksandr Andrushchenko2017-05-021-27/+221
| | | | | | | | | The patch clarifies the protocol that is used by the PV keyboard drivers. Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* xen/9pfs: build 9pfs Xen transport driverStefano Stabellini2017-05-022-0/+12
| | | | | | | | | | | | | | | | | This patch adds a Kconfig option and Makefile support for building the 9pfs Xen driver. CC: groug@kaod.org CC: boris.ostrovsky@oracle.com CC: jgross@suse.com CC: Eric Van Hensbergen <ericvh@gmail.com> CC: Ron Minnich <rminnich@sandia.gov> CC: Latchesar Ionkov <lucho@ionkov.net> CC: v9fs-developer@lists.sourceforge.net Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* xen/9pfs: receive responsesStefano Stabellini2017-05-021-0/+56
| | | | | | | | | | | | | | | | | | | | Upon receiving a notification from the backend, schedule the p9_xen_response work_struct. p9_xen_response checks if any responses are available, if so, it reads them one by one, calling p9_client_cb to send them up to the 9p layer (p9_client_cb completes the request). Handle the ring following the Xen 9pfs specification. CC: groug@kaod.org CC: jgross@suse.com CC: Eric Van Hensbergen <ericvh@gmail.com> CC: Ron Minnich <rminnich@sandia.gov> CC: Latchesar Ionkov <lucho@ionkov.net> CC: v9fs-developer@lists.sourceforge.net Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* xen/9pfs: send requests to the backendStefano Stabellini2017-05-021-2/+85
| | | | | | | | | | | | | | | | | | | | | | | | | Implement struct p9_trans_module create and close functions by looking at the available Xen 9pfs frontend-backend connections. We don't expect many frontend-backend connections, thus walking a list is OK. Send requests to the backend by copying each request to one of the available rings (each frontend-backend connection comes with multiple rings). Handle the ring and notifications following the 9pfs specification. If there are not enough free bytes on the ring for the request, wait on the wait_queue: the backend will send a notification after consuming more requests. CC: groug@kaod.org CC: jgross@suse.com CC: Eric Van Hensbergen <ericvh@gmail.com> CC: Ron Minnich <rminnich@sandia.gov> CC: Latchesar Ionkov <lucho@ionkov.net> CC: v9fs-developer@lists.sourceforge.net Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* xen/9pfs: connect to the backendStefano Stabellini2017-05-021-0/+281
| | | | | | | | | | | | | | | | | | | | | | Implement functions to handle the xenbus handshake. Upon connection, allocate the rings according to the protocol specification. Initialize a work_struct and a wait_queue. The work_struct will be used to schedule work upon receiving an event channel notification from the backend. The wait_queue will be used to wait when the ring is full and we need to send a new request. CC: groug@kaod.org CC: boris.ostrovsky@oracle.com CC: jgross@suse.com CC: Eric Van Hensbergen <ericvh@gmail.com> CC: Ron Minnich <rminnich@sandia.gov> CC: Latchesar Ionkov <lucho@ionkov.net> CC: v9fs-developer@lists.sourceforge.net Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* xen/9pfs: introduce Xen 9pfs transport driverStefano Stabellini2017-05-021-0/+125
| | | | | | | | | | | | | | | | | | | | Introduce the Xen 9pfs transport driver: add struct xenbus_driver to register as a xenbus driver and add struct p9_trans_module to register as v9fs driver. All functions are empty stubs for now. CC: groug@kaod.org CC: jgross@suse.com CC: Eric Van Hensbergen <ericvh@gmail.com> CC: Ron Minnich <rminnich@sandia.gov> CC: Latchesar Ionkov <lucho@ionkov.net> CC: v9fs-developer@lists.sourceforge.net Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* xen: introduce the header file for the Xen 9pfs transport protocolStefano Stabellini2017-05-021-0/+36
| | | | | | | | | | | | | It uses the new ring.h macros to declare rings and interfaces. CC: konrad.wilk@oracle.com CC: boris.ostrovsky@oracle.com CC: jgross@suse.com CC: groug@kaod.org Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* xen: import new ring macros in ring.hStefano Stabellini2017-05-021-0/+143
| | | | | | | | | | | | | | Sync the ring.h file with upstream Xen, to introduce the new ring macros. They will be used by the Xen transport for 9pfs. CC: konrad.wilk@oracle.com CC: boris.ostrovsky@oracle.com CC: jgross@suse.com CC: groug@kaod.org Signed-off-by: Stefano Stabellini <stefano@aporeto.com> Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: remove unused static function from smp_pv.cJuergen Gross2017-05-021-10/+0
| | | | | | xen_call_function_interrupt() isn't used in smp_pv.c. Remove it. Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: rename some PV-only functions in smp_pv.cVitaly Kuznetsov2017-05-021-14/+14
| | | | | | | | | | | After code split between PV and HVM some functions in xen_smp_ops have xen_pv_ prefix and some only xen_ which makes them look like they're common for both PV and HVM while they're not. Rename all the rest to have xen_pv_ prefix. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: enable PVHVM-only buildsVitaly Kuznetsov2017-05-022-4/+9
| | | | | | | | | Now everything is in place and we can move PV-only code under CONFIG_XEN_PV. CONFIG_XEN_PV_SMP is created to support the change. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* xen: create xen_create/destroy_contiguous_region() stubs for PVHVM only buildsVitaly Kuznetsov2017-05-021-0/+14
| | | | | | | | | | xen_create_contiguous_region()/xen_create_contiguous_region() are PV-only, they both contain xen_feature(XENFEAT_auto_translated_physmap) check and bail in the very beginning. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* xen/balloon: decorate PV-only parts with #ifdef CONFIG_XEN_PVVitaly Kuznetsov2017-05-021-10/+20
| | | | | | | | | | Balloon driver uses several PV-only concepts (xen_start_info, xen_extra_mem,..) and it seems the simpliest solution to make HVM-only build happy is to decorate these parts with #ifdefs. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: create stubs for HVM-only builds in page.hVitaly Kuznetsov2017-05-021-0/+25
| | | | | | | | | | | | | | __pfn_to_mfn() is only used from PV code (mmu_pv.c, p2m.c) and from page.h where all functions calling it check for xen_feature(XENFEAT_auto_translated_physmap) first so we can replace it with any stub to make build happy. set_foreign_p2m_mapping()/clear_foreign_p2m_mapping() are used from grant-table.c but only if !xen_feature(XENFEAT_auto_translated_physmap). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: define startup_xen for XEN PV onlyVitaly Kuznetsov2017-05-021-0/+4
| | | | | | | | | startup_xen references PV-only code, decorate it with #ifdef CONFIG_XEN_PV to make PV-free builds possible. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: put setup.c, pmu.c and apic.c under CONFIG_XEN_PVVitaly Kuznetsov2017-05-023-4/+9
| | | | | | | | | xen_pmu_init/finish() functions are used in suspend.c and enlighten.c, add stubs for now. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: split suspend.c for PV and PVHVM guestsVitaly Kuznetsov2017-05-025-55/+83
| | | | | | | | Slit the code in suspend.c into suspend_pv.c and suspend_hvm.c. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: split off mmu_pv.cVitaly Kuznetsov2017-05-023-2714/+2721
| | | | | | | | | Basically, mmu.c is renamed to mmu_pv.c and some code moved out to common mmu.c. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: split off mmu_hvm.cVitaly Kuznetsov2017-05-023-75/+80
| | | | | | | | Move PVHVM related code to mmu_hvm.c. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: split off smp_pv.cVitaly Kuznetsov2017-05-024-486/+509
| | | | | | | | | Basically, smp.c is renamed to smp_pv.c and some code moved out to common smp.c. struct xen_common_irq delcaration ended up in smp.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: split off smp_hvm.cVitaly Kuznetsov2017-05-025-54/+69
| | | | | | | | | | | Move PVHVM related code to smp_hvm.c. Drop 'static' qualifier from xen_smp_send_reschedule(), xen_smp_send_call_function_ipi(), xen_smp_send_call_function_single_ipi(), these functions will be moved to common smp code when smp_pv.c is split. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: split xen_cpu_die()Vitaly Kuznetsov2017-05-021-6/+20
| | | | | | | | | Split xen_cpu_die() into xen_pv_cpu_die() and xen_hvm_cpu_die() to support further splitting of smp.c. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: split xen_smp_prepare_boot_cpu()Vitaly Kuznetsov2017-05-021-19/+30
| | | | | | | | | Split xen_smp_prepare_boot_cpu() into xen_pv_smp_prepare_boot_cpu() and xen_hvm_smp_prepare_boot_cpu() to support further splitting of smp.c. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: split xen_smp_intr_init()/xen_smp_intr_free()Vitaly Kuznetsov2017-05-023-11/+35
| | | | | | | | | | xen_smp_intr_init() and xen_smp_intr_free() have PV-specific code and as a praparatory change to splitting smp.c we need to split these fucntions. Create xen_smp_intr_init_pv()/xen_smp_intr_free_pv(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: split off enlighten_pv.cVitaly Kuznetsov2017-05-023-1528/+1538
| | | | | | | | | Basically, enlighten.c is renamed to enlighten_pv.c and some code moved out to common enlighten.c. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: split off enlighten_hvm.cVitaly Kuznetsov2017-05-024-207/+227
| | | | | | | | | | | Move PVHVM related code to enlighten_hvm.c. Three functions: xen_cpuhp_setup(), xen_reboot(), xen_emergency_restart() are shared, drop static qualifier from them. These functions will go to common code once it is split from enlighten.c. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: split off enlighten_pvh.cVitaly Kuznetsov2017-05-023-108/+116
| | | | | | | | | Create enlighten_pvh.c by splitting off PVH related code from enlighten.c, put it under CONFIG_XEN_PVH. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: add CONFIG_XEN_PV to KconfigVitaly Kuznetsov2017-05-023-7/+22
| | | | | | | | | All code to support Xen PV will get under this new option. For the beginning, check for it in the common code. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: globalize have_vcpu_info_placementVitaly Kuznetsov2017-05-022-6/+8
| | | | | | | | | | | have_vcpu_info_placement applies to both PV and HVM and as we're going to split the code we need to make it global. Rename to xen_have_vcpu_info_placement. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* x86/xen: separate PV and HVM hypervisorsVitaly Kuznetsov2017-05-023-41/+79
| | | | | | | | | | | | | | | | | As a preparation to splitting the code we need to untangle it: x86_hyper_xen -> x86_hyper_xen_hvm and x86_hyper_xen_pv xen_platform() -> xen_platform_hvm() and xen_platform_pv() xen_cpu_up_prepare() -> xen_cpu_up_prepare_pv() and xen_cpu_up_prepare_hvm() xen_cpu_dead() -> xen_cpu_dead_pv() and xen_cpu_dead_pv_hvm() Add two parameters to xen_cpuhp_setup() to pass proper cpu_up_prepare and cpu_dead hooks. xen_set_cpu_features() is now PV-only so the redundant xen_pv_domain() check can be dropped. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2017-05-0193-711/+1845
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main x86 MM changes in this cycle were: - continued native kernel PCID support preparation patches to the TLB flushing code (Andy Lutomirski) - various fixes related to 32-bit compat syscall returning address over 4Gb in applications, launched from 64-bit binaries - motivated by C/R frameworks such as Virtuozzo. (Dmitry Safonov) - continued Intel 5-level paging enablement: in particular the conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov) - x86/mpx ABI corner case fixes/enhancements (Joerg Roedel) - ... plus misc updates, fixes and cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash x86/mm: Fix flush_tlb_page() on Xen x86/mm: Make flush_tlb_mm_range() more predictable x86/mm: Remove flush_tlb() and flush_tlb_current_task() x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() x86/mm/64: Fix crash in remove_pagetable() Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" x86/boot/e820: Remove a redundant self assignment x86/mm: Fix dump pagetables for 4 levels of page tables x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()" x86/espfix: Add support for 5-level paging x86/kasan: Extend KASAN to support 5-level paging x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y x86/paravirt: Add 5-level support to the paravirt code x86/mm: Define virtual memory map for 5-level paging x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert x86/boot: Detect 5-level paging support x86/mm/numa: Remove numa_nodemask_from_meminfo() ...
| * mm, zone_device: Replace {get, put}_zone_device_page() with a single ↵Dan Williams2017-05-015-30/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reference to fix pmem crash The x86 conversion to the generic GUP code included a small change which causes crashes and data corruption in the pmem code - not good. The root cause is that the /dev/pmem driver code implicitly relies on the x86 get_user_pages() implementation doing a get_page() on the page refcount, because get_page() does a get_zone_device_page() which properly refcounts pmem's separate page struct arrays that are not present in the regular page struct structures. (The pmem driver does this because it can cover huge memory areas.) But the x86 conversion to the generic GUP code changed the get_page() to page_cache_get_speculative() which is faster but doesn't do the get_zone_device_page() call the pmem code relies on. One way to solve the regression would be to change the generic GUP code to use get_page(), but that would slow things down a bit and punish other generic-GUP using architectures for an x86-ism they did not care about. (Arguably the pmem driver was probably not working reliably for them: but nvdimm is an Intel feature, so non-x86 exposure is probably still limited.) So restructure the pmem code's interface with the MM instead: get rid of the get/put_zone_device_page() distinction, integrate put_zone_device_page() into __put_page() and and restructure the pmem completion-wait and teardown machinery: Kirill points out that the calls to {get,put}_dev_pagemap() can be removed from the mm fast path if we take a single get_dev_pagemap() reference to signify that the page is alive and use the final put of the page to drop that reference. This does require some care to make sure that any waits for the percpu_ref to drop to zero occur *after* devm_memremap_page_release(), since it now maintains its own elevated reference. This speeds up things while also making the pmem refcounting more robust going forward. Suggested-by: Kirill Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Kirill Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/149339998297.24933.1129582806028305912.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/mm: Fix flush_tlb_page() on XenAndy Lutomirski2017-04-261-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | flush_tlb_page() passes a bogus range to flush_tlb_others() and expects the latter to fix it up. native_flush_tlb_others() has the fixup but Xen's version doesn't. Move the fixup to flush_tlb_others(). AFAICS the only real effect is that, without this fix, Xen would flush everything instead of just the one page on remote vCPUs in when flush_tlb_page() was called. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: e7b52ffd45a6 ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range") Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/mm: Make flush_tlb_mm_range() more predictableAndy Lutomirski2017-04-261-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm about to rewrite the function almost completely, but first I want to get a functional change out of the way. Currently, if flush_tlb_mm_range() does not flush the local TLB at all, it will never do individual page flushes on remote CPUs. This seems to be an accident, and preserving it will be awkward. Let's change it first so that any regressions in the rewrite will be easier to bisect and so that the rewrite can attempt to change no visible behavior at all. The fix is simple: we can simply avoid short-circuiting the calculation of base_pages_to_flush. As a side effect, this also eliminates a potential corner case: if tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range() could have ended up flushing the entire address space one page at a time. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.1492844372.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/mm: Remove flush_tlb() and flush_tlb_current_task()Andy Lutomirski2017-04-262-26/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I was trying to figure out what how flush_tlb_current_task() would possibly work correctly if current->mm != current->active_mm, but I realized I could spare myself the effort: it has no callers except the unused flush_tlb() macro. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()Andy Lutomirski2017-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mark_screen_rdonly() is the last remaining caller of flush_tlb(). flush_tlb_mm_range() is potentially faster and isn't obsolete. Compile-tested only because I don't know whether software that uses this mechanism even exists. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/mm/64: Fix crash in remove_pagetable()Kirill A. Shutemov2017-04-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | remove_pagetable() does page walk using p*d_page_vaddr() plus cast. It's not canonical approach -- we usually use p*d_offset() for that. It works fine as long as all page table levels are present. We broke the invariant by introducing folded p4d page table level. As result, remove_pagetable() interprets PMD as PUD and it leads to crash: BUG: unable to handle kernel paging request at ffff880300000000 IP: memchr_inv+0x60/0x110 PGD 317d067 P4D 317d067 PUD 3180067 PMD 33f102067 PTE 8000000300000060 Let's fix this by using p*d_offset() instead of p*d_page_vaddr() for page walk. Reported-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: f2a6a7050109 ("x86: Convert the rest of the code to support p4d_t") Link: http://lkml.kernel.org/r/20170425092557.21852-1-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() ↵Ingo Molnar2017-04-2312-128/+519
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implementation" This reverts commit 2947ba054a4dabbd82848728d765346886050029. Dan Williams reported dax-pmem kernel warnings with the following signature: WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200 percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic ... and bisected it to this commit, which suggests possible memory corruption caused by the x86 fast-GUP conversion. He also pointed out: " This is similar to the backtrace when we were not properly handling pud faults and was fixed with this commit: 220ced1676c4 "mm: fix get_user_pages() vs device-dax pud mappings" I've found some missing _devmap checks in the generic get_user_pages_fast() path, but this does not fix the regression [...] " So given that there are known bugs, and a pretty robust looking bisection points to this commit suggesting that are unknown bugs in the conversion as well, revert it for the time being - we'll re-try in v4.13. Reported-by: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dann.frazier@canonical.com Cc: dave.hansen@intel.com Cc: steve.capper@linaro.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/boot/e820: Remove a redundant self assignmentColin King2017-04-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove a redundant self assignment of table->nr_entries, it does nothing and is an artifact of code simplification re-work. Detected by CoverityScan, CID#1428450 ("Self assignment") Fixes: 441ac2f33dd7 ("x86/boot/e820: Simplify e820__update_table()") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: kernel-janitors@vger.kernel.org Cc: Denys Vlasenko <dvlasenk@redhat.com> Link: http://lkml.kernel.org/r/20170413155912.12078-1-colin.king@canonical.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86/mm: Fix dump pagetables for 4 levels of page tablesJuergen Gross2017-04-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit fdd3d8ce0ea62 ("x86/dump_pagetables: Add support for 5-level paging") introduced an error for dumping with only 4 levels by setting PGD_LEVEL_MULT to a wrong value. This is leading to e.g. addresses printed as "(null)" for ranges: x86/mm: Found insecure W+X mapping at address (null)/(null) Make PGD_LEVEL_MULT a multiple of PTRS_PER_P4D instead of PTRS_PER_PUD Fixes: fdd3d8ce0ea62 ("x86/dump_pagetables: Add support for 5-level paging") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170412143634.6846-1-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadowJoerg Roedel2017-04-121-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check between the hardware state and our shadow of it is checked in the signal handler for all bounds exceptions, even for the ones where we don't keep the shadow up2date. This is a problem because when no shadow is kept the handler fails at this point and hides the real reason of the exception. Move the check into the code-path evaluating normal bounds exceptions to prevent this. Signed-off-by: Joerg Roedel <jroedel@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kselftest@vger.kernel.org Link: http://lkml.kernel.org/r/1491488598-27346-1-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/mpx: Correctly report do_mpx_bt_fault() failures to user-spaceJoerg Roedel2017-04-121-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When this function fails it just sends a SIGSEGV signal to user-space using force_sig(). This signal is missing essential information about the cause, e.g. the trap_nr or an error code. Fix this by propagating the error to the only caller of mpx_handle_bd_fault(), do_bounds(), which sends the correct SIGSEGV signal to the process. Signed-off-by: Joerg Roedel <jroedel@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: fe3d197f84319 ('x86, mpx: On-demand kernel allocation of bounds tables') Link: http://lkml.kernel.org/r/1491488362-27198-1-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Merge branch 'x86/boot' into x86/mm, to avoid conflictIngo Molnar2017-04-1167-886/+1049
| |\ | | | | | | | | | | | | | | | | | | | | | There's a conflict between ongoing level-5 paging support and the E820 rewrite. Since the E820 rewrite is essentially ready, merge it into x86/mm to reduce tree conflicts. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"Thomas Gleixner2017-04-081-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 474aeffd88b87746a75583f356183d5c6caa4213 due to testing failures. Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20170406124459.dwn5zhpr2xqg3lqm@node.shutemov.name
| * | x86/espfix: Add support for 5-level pagingKirill A. Shutemov2017-04-042-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need extra virtual address space for ESPFIX, so it stays within one PUD page table for both 4- and 5-level paging. Redefining ESPFIX_BASE_ADDR using P4D_SHIFT instead of PGDIR_SHIFT would make it stay in the same place regarding of paging mode. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-8-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/kasan: Extend KASAN to support 5-level pagingKirill A. Shutemov2017-04-041-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch bring support for a non-folded additional page table level. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-7-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=yKirill A. Shutemov2017-04-044-2/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extends pagetable headers to support the new paging mode. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/paravirt: Add 5-level support to the paravirt codeKirill A. Shutemov2017-04-044-14/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add operations to allocate/release p4ds. Xen requires more work. We will need to come back to it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/mm: Define virtual memory map for 5-level pagingKirill A. Shutemov2017-04-046-8/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first part of memory map (up to %esp fixup) simply scales existing map for 4-level paging by factor of 9 -- number of bits addressed by the additional page table level. The rest of the map is unchanged. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170330080731.65421-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>