summaryrefslogtreecommitdiffstats
path: root/mm/vmalloc.c
Commit message (Collapse)AuthorAgeFilesLines
* mm/vmalloc.c: fix the comment of find_vm_areaHui Su2020-10-131-2/+2
| | | | | | | | | | Fix the comment of find_vm_area() and get_vm_area() Signed-off-by: Hui Su <sh_def@163.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200927153034.GA199877@rlk Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc.c: update the comment in __vmalloc_area_node()Hui Su2020-10-131-1/+1
| | | | | | | | | | | | | Since c67dc624757 ("mm/vmalloc: do not call kmemleak_free() on not yet accounted memory"), the __vunmap() have been changed to __vfree(), so update the confusing comment(). Signed-off-by: Hui Su <sh_def@163.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Roman Penyaev <rpenyaev@suse.de> Link: https://lkml.kernel.org/r/20200927155409.GA3315@rlk Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vunmap: add cond_resched() in vunmap_pmd_rangeAneesh Kumar K.V2020-08-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Like zap_pte_range add cond_resched so that we can avoid softlockups as reported below. On non-preemptible kernel with large I/O map region (like the one we get when using persistent memory with sector mode), an unmap of the namespace can report below softlockups. 22724.027334] watchdog: BUG: soft lockup - CPU#49 stuck for 23s! [ndctl:50777] NIP [c0000000000dc224] plpar_hcall+0x38/0x58 LR [c0000000000d8898] pSeries_lpar_hpte_invalidate+0x68/0xb0 Call Trace: flush_hash_page+0x114/0x200 hpte_need_flush+0x2dc/0x540 vunmap_page_range+0x538/0x6f0 free_unmap_vmap_area+0x30/0x70 remove_vm_area+0xfc/0x140 __vunmap+0x68/0x270 __iounmap.part.0+0x34/0x60 memunmap+0x54/0x70 release_nodes+0x28c/0x300 device_release_driver_internal+0x16c/0x280 unbind_store+0x124/0x170 drv_attr_store+0x44/0x60 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 __vfs_write+0x3c/0x70 vfs_write+0xd8/0x260 ksys_write+0xdc/0x130 system_call+0x5c/0x70 Reported-by: Harish Sriram <harish@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200807075933.310240-1-aneesh.kumar@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc.c: remove BUG() from the find_va_links()Uladzislau Rezki (Sony)2020-08-071-9/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of BUG() macro, that should be used only when a critical situation happens and a system is not able to function anymore. Replace it with WARN() macro instead, dump some extra information about start/end addresses of both VAs which overlap. Such overlap data can help to figure out what happened making further analysis easier. For example if both areas are identical it could mean a double free. A recovery process consists of declining all further steps regarding inserting of conflicting overlap range. In that sense find_va_links() now can return NULL, so its return value has to be checked by callers. Side effect of such process is it can leak memory, but it is better than just killing a machine for no good reason. Apart of that a debugging process can be done on alive system. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20200711104531.12242-1-urezki@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: vmalloc: remove redundant assignment in unmap_kernel_range_noflush()Mike Rapoport2020-08-071-1/+0
| | | | | | | | | | | | | 'addr' is set to 'start' and then a few lines afterwards 'start' is set to 'addr'. Remove the second asignment. Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Joerg Roedel <jroedel@suse.de> Link: http://lkml.kernel.org/r/20200707163226.374685-1-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: update the header about KVA reworkUladzislau Rezki (Sony)2020-08-071-0/+1
| | | | | | | | | | | Reflect information about the author, date and year when the KVA rework was done. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200622195821.4796-1-urezki@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: switch to "propagate()" callbackUladzislau Rezki (Sony)2020-08-071-19/+6
| | | | | | | | | | | | | | | | | An augment_tree_propagate_from() function uses its own implementation that populates a tree from the specified node toward a root node. On the other hand the RB_DECLARE_CALLBACKS_MAX macro provides the "propagate()" callback that does exactly the same. Having two similar functions does not make sense and is redundant. Reuse "built in" functionality to the macros. So the code size gets reduced. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200527205054.1696-3-urezki@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: simplify augment_tree_propagate_check()Uladzislau Rezki (Sony)2020-08-071-34/+8
| | | | | | | | | | | | | | | | This function is for debug purpose only. Currently it uses recursion for tree traversal, checking an augmented value of each node to find out if it is valid or not. The recursion can corrupt the stack because the tree can be huge if synthetic tests are applied. To prevent it, navigate the tree from bottom to upper levels using a regular list instead, because nodes are linked among each other also. It is faster and without recursion. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200527205054.1696-2-urezki@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: simplify merge_or_add_vmap_area()Uladzislau Rezki (Sony)2020-08-071-11/+14
| | | | | | | | | | | | | | | | | Currently when a VA is deallocated and is about to be placed back to the tree, it can be either: merged with next/prev neighbors or inserted if not coalesced. On those steps the tree can be populated several times. For example when both neighbors are merged. It can be avoided and simplified in fact. Therefore do it only once when VA points to final merged area, after all manipulations: merging/removing/inserting. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200527205054.1696-1-urezki@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vmalloc: convert to XArrayMatthew Wilcox (Oracle)2020-08-071-29/+11
| | | | | | | | | | | | The radix tree of vmap blocks is simpler to express as an XArray. Reduces both the text and data sizes of the object file and eliminates a user of the radix tree preload API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Link: http://lkml.kernel.org/r/20200603171448.5894-1-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: move p?d_alloc_track to separate header fileJoerg Roedel2020-08-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The functions are only used in two source files, so there is no need for them to be in the global <linux/mm.h> header. Move them to the new <linux/pgalloc-track.h> header and include it only where needed. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200609120533.25867-1-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove vmalloc_execChristoph Hellwig2020-06-261-20/+0
| | | | | | | | | | | | | | | | | | | Merge vmalloc_exec into its only caller. Note that for !CONFIG_MMU __vmalloc_node_range maps to __vmalloc, which directly clears the __GFP_HIGHMEM added by the vmalloc_exec stub anyway. Link: http://lkml.kernel.org/r/20200618064307.32739-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc.c: fix a warning while make xmldocsMasanari Iida2020-06-261-1/+0
| | | | | | | | | | | | | | | | This patch fixes following warning while "make xmldocs" mm/vmalloc.c:1877: warning: Excess function parameter 'prot' description in 'vm_map_ram' This warning started since commit d4efd79a81ab ("mm: remove the prot argument from vm_map_ram"). Link: http://lkml.kernel.org/r/20200622152850.140871-1-standby24x7@gmail.com Fixes: d4efd79a81ab ("mm: remove the prot argument from vm_map_ram") Signed-off-by: Masanari Iida <standby24x7@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: fix a typo in commentJeongtae Park2020-06-041-1/+1
| | | | | | | | | | | | | There is a typo in comment, fix it. "nother" -> "another" Signed-off-by: Jeongtae Park <jtp.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20200604185239.20765-1-jtp.park@samsung.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove vmalloc_sync_(un)mappings()Joerg Roedel2020-06-021-21/+0
| | | | | | | | | | | | | | | | | | | | | | | | | These functions are not needed anymore because the vmalloc and ioremap mappings are now synchronized when they are created or torn down. Remove all callers and function definitions. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200515140023.25469-7-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: track which page-table levels were modifiedJoerg Roedel2020-06-021-26/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | Track at which levels in the page-table entries were modified by vmap/vunmap. After the page-table has been modified, use that information do decide whether the new arch_sync_kernel_mappings() needs to be called. [akpm@linux-foundation.org: map_kernel_range_noflush() needs the arch_sync_kernel_mappings() call] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200515140023.25469-3-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove vmalloc_user_node_flagsChristoph Hellwig2020-06-021-20/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open code it in __bpf_map_area_alloc, which is the only caller. Also clean up __bpf_map_area_alloc to have a single vmalloc call with slightly different flags instead of the current two different calls. For this to compile for the nommu case add a __vmalloc_node_range stub to nommu.c. [akpm@linux-foundation.org: fix nommu.c build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200414131348.444715-27-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: switch the test_vmalloc module to use __vmalloc_nodeChristoph Hellwig2020-06-021-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to export the very low-level __vmalloc_node_range when the test module can use a slightly higher level variant. [akpm@linux-foundation.org: add missing `node' arg] [akpm@linux-foundation.org: fix riscv nommu build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-26-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove __vmalloc_node_flags_callerChristoph Hellwig2020-06-021-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just use __vmalloc_node instead which gets and extra argument. To be able to to use __vmalloc_node in all caller make it available outside of vmalloc and implement it in nommu.c. [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200414131348.444715-25-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove both instances of __vmalloc_node_flagsChristoph Hellwig2020-06-021-14/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The real version just had a few callers that can open code it and remove one layer of indirection. The nommu stub was public but only had a single caller, so remove it and avoid a CONFIG_MMU ifdef in vmalloc.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-24-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove the prot argument to __vmalloc_nodeChristoph Hellwig2020-06-021-21/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is always PAGE_KERNEL now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-23-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove the pgprot argument to __vmallocChristoph Hellwig2020-06-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv] Acked-by: Gao Xiang <xiang@kernel.org> [erofs] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: enforce that vmap can't map pages executableChristoph Hellwig2020-06-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To help enforcing the W^X protection don't allow remapping existing pages as executable. x86 bits from Peter Zijlstra, arm64 bits from Mark Rutland. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Mark Rutland <mark.rutland@arm.com>. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-20-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove the prot argument from vm_map_ramChristoph Hellwig2020-06-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is always PAGE_KERNEL - for long term mappings with other properties vmap should be used. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-19-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove unmap_vmap_areaChristoph Hellwig2020-06-021-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function just has a single caller, open code it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-18-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove map_vm_rangeChristoph Hellwig2020-06-021-14/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch all callers to map_kernel_range, which symmetric to the unmap side (as well as the _noflush versions). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-17-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: don't return the number of pages from map_kernel_range{,_noflush}Christoph Hellwig2020-06-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | None of the callers needs the number of pages, and a 0 / -errno return value is a lot more intuitive. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-16-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: rename vmap_page_range to map_kernel_rangeChristoph Hellwig2020-06-021-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This matches the map_kernel_range_noflush API. Also change to pass a size instead of the end, similar to the noflush version. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-15-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove vmap_page_range_noflush and vunmap_page_rangeChristoph Hellwig2020-06-021-58/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These have non-static aliases called map_kernel_range_noflush and unmap_kernel_range_noflush that just differ slightly in the calling conventions that pass addr + size instead of an end. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-14-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: pass addr as unsigned long to vb_freeChristoph Hellwig2020-06-021-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ever use of addr in vb_free casts to unsigned long first, and the caller has an unsigned long version of the address available anyway. Just pass that and avoid all the casts. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-13-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: only allow page table mappings for built-in zsmallocChristoph Hellwig2020-06-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows to unexport map_vm_area and unmap_kernel_range, which are rather deep internal and should not be available to modules, as they for example allow fine grained control of mapping permissions, and also allow splitting the setup of a vmalloc area and the actual mapping and thus expose vmalloc internals. zsmalloc is typically built-in and continues to work (just like the percpu-vm code using a similar patter), while modular zsmalloc also continues to work, but must use copies. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-12-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: unexport unmap_kernel_range_noflushChristoph Hellwig2020-06-021-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are no modular users of this function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-10-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove __get_vm_areaChristoph Hellwig2020-06-021-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch the two remaining callers to use __get_vm_area_caller instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-9-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vmalloc: fix remap_vmalloc_range() bounds checksJann Horn2020-04-211-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | remap_vmalloc_range() has had various issues with the bounds checks it promises to perform ("This function checks that addr is a valid vmalloc'ed area, and that it is big enough to cover the vma") over time, e.g.: - not detecting pgoff<<PAGE_SHIFT overflow - not detecting (pgoff<<PAGE_SHIFT)+usize overflow - not checking whether addr and addr+(pgoff<<PAGE_SHIFT) are the same vmalloc allocation - comparing a potentially wildly out-of-bounds pointer with the end of the vmalloc region In particular, since commit fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer dereferences by calling mmap() on a BPF map with a size that is bigger than the distance from the start of the BPF map to the end of the address space. This could theoretically be used as a kernel ASLR bypass, by using whether mmap() with a given offset oopses or returns an error code to perform a binary search over the possible address range. To allow remap_vmalloc_range_partial() to verify that addr and addr+(pgoff<<PAGE_SHIFT) are in the same vmalloc region, pass the offset to remap_vmalloc_range_partial() instead of adding it to the pointer in remap_vmalloc_range(). In remap_vmalloc_range_partial(), fix the check against get_vm_area_size() by using size comparisons instead of pointer comparisons, and add checks for pgoff. Fixes: 833423143c3a ("[PATCH] mm: introduce remap_vmalloc_range()") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@chromium.org> Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: fix a typo in commentQiujun Huang2020-04-071-1/+1
| | | | | | | | | | | There is a typo in comment, fix it. "exeeds" -> "exceeds" Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200404060136.10838-1-hqjagain@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86/mm: split vmalloc_sync_all()Joerg Roedel2020-03-211-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES] Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'v5.5-rc7' into efi/core, to pick up fixesIngo Molnar2020-01-201-2/+2
|\ | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * mm, debug_pagealloc: don't rely on static keys too earlyVlastimil Babka2020-01-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") has introduced a static key to reduce overhead when debug_pagealloc is compiled in but not enabled. It relied on the assumption that jump_label_init() is called before parse_early_param() as in start_kernel(), so when the "debug_pagealloc=on" option is parsed, it is safe to enable the static key. However, it turns out multiple architectures call parse_early_param() earlier from their setup_arch(). x86 also calls jump_label_init() even earlier, so no issue was found while testing the commit, but same is not true for e.g. ppc64 and s390 where the kernel would not boot with debug_pagealloc=on as found by our QA. To fix this without tricky changes to init code of multiple architectures, this patch partially reverts the static key conversion from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch code) of debug_pagealloc_enabled() will again test a simple bool variable. Fastpath mm code is converted to a new debug_pagealloc_enabled_static() variant that relies on the static key, which is enabled in a well-defined point in mm_init() where it's guaranteed that jump_label_init() has been called, regardless of architecture. [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early] Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'x86/mm' into efi/core, to pick up dependenciesIngo Molnar2020-01-101-0/+8
|\ \ | |/ |/| | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * mm, x86/mm: Untangle address space layout definitions from basic pgtable ↵Ingo Molnar2019-12-101-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type definitions - Untangle the somewhat incestous way of how VMALLOC_START is used all across the kernel, but is, on x86, defined deep inside one of the lowest level page table headers. It doesn't help that vmalloc.h only includes a single asm header: #include <asm/page.h> /* pgprot_t */ So there was no existing cross-arch way to decouple address layout definitions from page.h details. I used this: #ifndef VMALLOC_START # include <asm/vmalloc.h> #endif This way every architecture that wants to simplify page.h can do so. - Also on x86 we had a couple of LDT related inline functions that used the late-stage address space layout positions - but these could be uninlined without real trouble - the end result is cleaner this way as well. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | kasan: don't assume percpu shadow allocations will succeedDaniel Axtens2019-12-171-10/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzkaller and the fault injector showed that I was wrong to assume that we could ignore percpu shadow allocation failures. Handle failures properly. Merge all the allocated areas back into the free list and release the shadow, then clean up and return NULL. The shadow is released unconditionally, which relies upon the fact that the release function is able to tolerate pages not being present. Also clean up shadows in the recovery path - currently they are not released, which leaks a bit of memory. Link: http://lkml.kernel.org/r/20191205140407.1874-3-dja@axtens.net Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: Daniel Axtens <dja@axtens.net> Reported-by: syzbot+82e323920b78d54aaed5@syzkaller.appspotmail.com Reported-by: syzbot+59b7daa4315e07a994f1@syzkaller.appspotmail.com Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | kasan: fix crashes on access to memory mapped by vm_map_ram()Andrey Ryabinin2019-12-171-45/+40
|/ | | | | | | | | | | | | | | | | | | | | | With CONFIG_KASAN_VMALLOC=y any use of memory obtained via vm_map_ram() will crash because there is no shadow backing that memory. Instead of sprinkling additional kasan_populate_vmalloc() calls all over the vmalloc code, move it into alloc_vmap_area(). This will fix vm_map_ram() and simplify the code a bit. [aryabinin@virtuozzo.com: v2] Link: http://lkml.kernel.org/r/20191205095942.1761-1-aryabinin@virtuozzo.comLink: http://lkml.kernel.org/r/20191204204534.32202-1-aryabinin@virtuozzo.com Fixes: 3c5c3cfb9ef4 ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Alexander Potapenko <glider@google.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kasan: support backing vmalloc space with real shadow memoryDaniel Axtens2019-12-011-8/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "kasan: support backing vmalloc space with real shadow memory", v11. Currently, vmalloc space is backed by the early shadow page. This means that kasan is incompatible with VMAP_STACK. This series provides a mechanism to back vmalloc space with real, dynamically allocated memory. I have only wired up x86, because that's the only currently supported arch I can work with easily, but it's very easy to wire up other architectures, and it appears that there is some work-in-progress code to do this on arm64 and s390. This has been discussed before in the context of VMAP_STACK: - https://bugzilla.kernel.org/show_bug.cgi?id=202009 - https://lkml.org/lkml/2018/7/22/198 - https://lkml.org/lkml/2019/7/19/822 In terms of implementation details: Most mappings in vmalloc space are small, requiring less than a full page of shadow space. Allocating a full shadow page per mapping would therefore be wasteful. Furthermore, to ensure that different mappings use different shadow pages, mappings would have to be aligned to KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE. Instead, share backing space across multiple mappings. Allocate a backing page when a mapping in vmalloc space uses a particular page of the shadow region. This page can be shared by other vmalloc mappings later on. We hook in to the vmap infrastructure to lazily clean up unused shadow memory. Testing with test_vmalloc.sh on an x86 VM with 2 vCPUs shows that: - Turning on KASAN, inline instrumentation, without vmalloc, introuduces a 4.1x-4.2x slowdown in vmalloc operations. - Turning this on introduces the following slowdowns over KASAN: * ~1.76x slower single-threaded (test_vmalloc.sh performance) * ~2.18x slower when both cpus are performing operations simultaneously (test_vmalloc.sh sequential_test_order=1) This is unfortunate but given that this is a debug feature only, not the end of the world. The benchmarks are also a stress-test for the vmalloc subsystem: they're not indicative of an overall 2x slowdown! This patch (of 4): Hook into vmalloc and vmap, and dynamically allocate real shadow memory to back the mappings. Most mappings in vmalloc space are small, requiring less than a full page of shadow space. Allocating a full shadow page per mapping would therefore be wasteful. Furthermore, to ensure that different mappings use different shadow pages, mappings would have to be aligned to KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE. Instead, share backing space across multiple mappings. Allocate a backing page when a mapping in vmalloc space uses a particular page of the shadow region. This page can be shared by other vmalloc mappings later on. We hook in to the vmap infrastructure to lazily clean up unused shadow memory. To avoid the difficulties around swapping mappings around, this code expects that the part of the shadow region that covers the vmalloc space will not be covered by the early shadow page, but will be left unmapped. This will require changes in arch-specific code. This allows KASAN with VMAP_STACK, and may be helpful for architectures that do not have a separate module space (e.g. powerpc64, which I am currently working on). It also allows relaxing the module alignment back to PAGE_SIZE. Testing with test_vmalloc.sh on an x86 VM with 2 vCPUs shows that: - Turning on KASAN, inline instrumentation, without vmalloc, introuduces a 4.1x-4.2x slowdown in vmalloc operations. - Turning this on introduces the following slowdowns over KASAN: * ~1.76x slower single-threaded (test_vmalloc.sh performance) * ~2.18x slower when both cpus are performing operations simultaneously (test_vmalloc.sh sequential_test_order=3D1) This is unfortunate but given that this is a debug feature only, not the end of the world. The full benchmark results are: Performance No KASAN KASAN original x baseline KASAN vmalloc x baseline x KASAN fix_size_alloc_test 662004 11404956 17.23 19144610 28.92 1.68 full_fit_alloc_test 710950 12029752 16.92 13184651 18.55 1.10 long_busy_list_alloc_test 9431875 43990172 4.66 82970178 8.80 1.89 random_size_alloc_test 5033626 23061762 4.58 47158834 9.37 2.04 fix_align_alloc_test 1252514 15276910 12.20 31266116 24.96 2.05 random_size_align_alloc_te 1648501 14578321 8.84 25560052 15.51 1.75 align_shift_alloc_test 147 830 5.65 5692 38.72 6.86 pcpu_alloc_test 80732 125520 1.55 140864 1.74 1.12 Total Cycles 119240774314 763211341128 6.40 1390338696894 11.66 1.82 Sequential, 2 cpus No KASAN KASAN original x baseline KASAN vmalloc x baseline x KASAN fix_size_alloc_test 1423150 14276550 10.03 27733022 19.49 1.94 full_fit_alloc_test 1754219 14722640 8.39 15030786 8.57 1.02 long_busy_list_alloc_test 11451858 52154973 4.55 107016027 9.34 2.05 random_size_alloc_test 5989020 26735276 4.46 68885923 11.50 2.58 fix_align_alloc_test 2050976 20166900 9.83 50491675 24.62 2.50 random_size_align_alloc_te 2858229 17971700 6.29 38730225 13.55 2.16 align_shift_alloc_test 405 6428 15.87 26253 64.82 4.08 pcpu_alloc_test 127183 151464 1.19 216263 1.70 1.43 Total Cycles 54181269392 308723699764 5.70 650772566394 12.01 2.11 fix_size_alloc_test 1420404 14289308 10.06 27790035 19.56 1.94 full_fit_alloc_test 1736145 14806234 8.53 15274301 8.80 1.03 long_busy_list_alloc_test 11404638 52270785 4.58 107550254 9.43 2.06 random_size_alloc_test 6017006 26650625 4.43 68696127 11.42 2.58 fix_align_alloc_test 2045504 20280985 9.91 50414862 24.65 2.49 random_size_align_alloc_te 2845338 17931018 6.30 38510276 13.53 2.15 align_shift_alloc_test 472 3760 7.97 9656 20.46 2.57 pcpu_alloc_test 118643 132732 1.12 146504 1.23 1.10 Total Cycles 54040011688 309102805492 5.72 651325675652 12.05 2.11 [dja@axtens.net: fixups] Link: http://lkml.kernel.org/r/20191120052719.7201-1-dja@axtens.net Link: https://bugzilla.kernel.org/show_bug.cgi?id=3D202009 Link: http://lkml.kernel.org/r/20191031093909.9228-2-dja@axtens.net Signed-off-by: Mark Rutland <mark.rutland@arm.com> [shadow rework] Signed-off-by: Daniel Axtens <dja@axtens.net> Co-developed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: rework vmap_area_lockUladzislau Rezki (Sony)2019-12-011-30/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the new allocation approach introduced in the 5.2 kernel, it becomes possible to get rid of one global spinlock. By doing that we can further improve the KVA from the performance point of view. Basically we can have two independent locks, one for allocation part and another one for deallocation, because of two different entities: "free data structures" and "busy data structures". As a result, allocation/deallocation operations can still interfere between each other in case of running simultaneously on different CPUs, it means there is still dependency, but with two locks it becomes lower. Summarizing: - it reduces the high lock contention - it allows to perform operations on "free" and "busy" trees in parallel on different CPUs. Please note it does not solve scalability issue. Test results: In order to evaluate this patch, we can run "vmalloc test driver" to see how many CPU cycles it takes to complete all test cases running sequentially. All online CPUs run it so it will cause a high lock contention. HiKey 960, ARM64, 8xCPUs, big.LITTLE: <snip> sudo ./test_vmalloc.sh sequential_test_order=1 <snip> <default> [ 390.950557] All test took CPU0=457126382 cycles [ 391.046690] All test took CPU1=454763452 cycles [ 391.128586] All test took CPU2=454539334 cycles [ 391.222669] All test took CPU3=455649517 cycles [ 391.313946] All test took CPU4=388272196 cycles [ 391.410425] All test took CPU5=384036264 cycles [ 391.492219] All test took CPU6=387432964 cycles [ 391.578433] All test took CPU7=387201996 cycles <default> <patched> [ 304.721224] All test took CPU0=391521310 cycles [ 304.821219] All test took CPU1=393533002 cycles [ 304.917120] All test took CPU2=392243032 cycles [ 305.008986] All test took CPU3=392353853 cycles [ 305.108944] All test took CPU4=297630721 cycles [ 305.196406] All test took CPU5=297548736 cycles [ 305.288602] All test took CPU6=297092392 cycles [ 305.381088] All test took CPU7=297293597 cycles <patched> ~14%-23% patched variant is better. Link: http://lkml.kernel.org/r/20191022155800.20468-1-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: add more comments to the adjust_va_to_fit_type()Uladzislau Rezki (Sony)2019-12-011-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | When fit type is NE_FIT_TYPE there is a need in one extra object. Usually the "ne_fit_preload_node" per-CPU variable has it and there is no need in GFP_NOWAIT allocation, but there are exceptions. This commit just adds more explanations, as a result giving answers on questions like when it can occur, how often, under which conditions and what happens if GFP_NOWAIT gets failed. Link: http://lkml.kernel.org/r/20191016095438.12391-3-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Daniel Wagner <dwagner@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: respect passed gfp_mask when doing preloadingUladzislau Rezki (Sony)2019-12-011-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | Allocation functions should comply with the given gfp_mask as much as possible. The preallocation code in alloc_vmap_area doesn't follow that pattern and it is using a hardcoded GFP_KERNEL. Although this doesn't really make much difference because vmalloc is not GFP_NOWAIT compliant in general (e.g. page table allocations are GFP_KERNEL) there is no reason to spread that bad habit and it is good to fix the antipattern. [mhocko@suse.com: rewrite changelog] Link: http://lkml.kernel.org/r/20191016095438.12391-2-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Daniel Wagner <dwagner@suse.de> Cc: Hillf Danton <hdanton@sina.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: remove preempt_disable/enable when doing preloadingUladzislau Rezki (Sony)2019-12-011-17/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some background. The preemption was disabled before to guarantee that a preloaded object is available for a CPU, it was stored for. That was achieved by combining the disabling the preemption and taking the spin lock while the ne_fit_preload_node is checked. The aim was to not allocate in atomic context when spinlock is taken later, for regular vmap allocations. But that approach conflicts with CONFIG_PREEMPT_RT philosophy. It means that calling spin_lock() with disabled preemption is forbidden in the CONFIG_PREEMPT_RT kernel. Therefore, get rid of preempt_disable() and preempt_enable() when the preload is done for splitting purpose. As a result we do not guarantee now that a CPU is preloaded, instead we minimize the case when it is not, with this change, by populating the per cpu preload pointer under the vmap_area_lock. This implies that at least each caller that has done the preallocation will not fallback to an atomic allocation later. It is possible that the preallocation would be pointless or that no preallocation is done because of the race but the data shows that this is really rare. For example i run the special test case that follows the preload pattern and path. 20 "unbind" threads run it and each does 1000000 allocations. Only 3.5 times among 1000000 a CPU was not preloaded. So it can happen but the number is negligible. [mhocko@suse.com: changelog additions] Link: http://lkml.kernel.org/r/20191016095438.12391-1-urezki@gmail.com Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Daniel Wagner <dwagner@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc.c: remove unnecessary highmem_mask from parameter of ↵Liu Xiang2019-12-011-1/+1
| | | | | | | | | | | | | gfpflags_allow_blocking() gfpflags_allow_blocking() does not care about __GFP_HIGHMEM, so highmem_mask can be removed. Link: http://lkml.kernel.org/r/1568812319-3467-1-git-send-email-liuxiang_1999@126.com Signed-off-by: Liu Xiang <liuxiang_1999@126.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* bpf: Add mmap() support for BPF_MAP_TYPE_ARRAYAndrii Nakryiko2019-11-181-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add ability to memory-map contents of BPF array map. This is extremely useful for working with BPF global data from userspace programs. It allows to avoid typical bpf_map_{lookup,update}_elem operations, improving both performance and usability. There had to be special considerations for map freezing, to avoid having writable memory view into a frozen map. To solve this issue, map freezing and mmap-ing is happening under mutex now: - if map is already frozen, no writable mapping is allowed; - if map has writable memory mappings active (accounted in map->writecnt), map freezing will keep failing with -EBUSY; - once number of writable memory mappings drops to zero, map freezing can be performed again. Only non-per-CPU plain arrays are supported right now. Maps with spinlocks can't be memory mapped either. For BPF_F_MMAPABLE array, memory allocation has to be done through vmalloc() to be mmap()'able. We also need to make sure that array data memory is page-sized and page-aligned, so we over-allocate memory in such a way that struct bpf_array is at the end of a single page of memory with array->value being aligned with the start of the second page. On deallocation we need to accomodate this memory arrangement to free vmalloc()'ed memory correctly. One important consideration regarding how memory-mapping subsystem functions. Memory-mapping subsystem provides few optional callbacks, among them open() and close(). close() is called for each memory region that is unmapped, so that users can decrease their reference counters and free up resources, if necessary. open() is *almost* symmetrical: it's called for each memory region that is being mapped, **except** the very first one. So bpf_map_mmap does initial refcnt bump, while open() will do any extra ones after that. Thus number of close() calls is equal to number of open() calls plus one more. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/bpf/20191117172806.2195367-4-andriin@fb.com
* augmented rbtree: add new RB_DECLARE_CALLBACKS_MAX macroMichel Lespinasse2019-09-251-3/+2
| | | | | | | | | | | | | | | | | | | | | Add RB_DECLARE_CALLBACKS_MAX, which generates augmented rbtree callbacks for the case where the augmented value is a scalar whose definition follows a max(f(node)) pattern. This actually covers all present uses of RB_DECLARE_CALLBACKS, and saves some (source) code duplication in the various RBCOMPUTE function definitions. [walken@google.com: fix mm/vmalloc.c] Link: http://lkml.kernel.org/r/CANN689FXgK13wDYNh1zKxdipeTuALG4eKvKpsdZqKFJ-rvtGiQ@mail.gmail.com [walken@google.com: re-add check to check_augmented()] Link: http://lkml.kernel.org/r/20190727022027.GA86863@google.com Link: http://lkml.kernel.org/r/20190703040156.56953-3-walken@google.com Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>