summaryrefslogtreecommitdiffstats
path: root/mm
Commit message (Collapse)AuthorAgeFilesLines
* ksm: count ksm merging pages for each processxu xin2022-04-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some applications or containers want to use KSM by calling madvise() to advise areas of address space to be MERGEABLE. But they may not know which applications are more likely to cause real merges in the deployment. If this patch is applied, it helps them know their corresponding number of merged pages, and then optimize their app code. As current KSM only counts the number of KSM merging pages(e.g. ksm_pages_sharing and ksm_pages_shared) of the whole system, we cannot see the more fine-grained KSM merging, for the upper application optimization, the merging area cannot be set easily according to the KSM page merging probability of each process. Therefore, it is necessary to add extra statistical means so that the upper level users can know the detailed KSM merging information of each process. We add a new proc file named as ksm_merging_pages under /proc/<pid>/ to indicate the involved ksm merging pages of this process. [akpm@linux-foundation.org: fix comment typo, remove BUG_ON()s] Link: https://lkml.kernel.org/r/20220325082318.2352853-1-xu.xin16@zte.com.cn Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Cc: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Ohhoon Kwon <ohoono.kwon@samsung.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Feng Tang <feng.tang@intel.com> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/page_alloc: reuse tail struct pages for compound devmapsJoao Martins2022-04-281-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently memmap_init_zone_device() ends up initializing 32768 pages when it only needs to initialize 128 given tail page reuse. That number is worse with 1GB compound pages, 262144 instead of 128. Update memmap_init_zone_device() to skip redundant initialization, detailed below. When a pgmap @vmemmap_shift is set, all pages are mapped at a given huge page alignment and use compound pages to describe them as opposed to a struct per 4K. With @vmemmap_shift > 0 and when struct pages are stored in ram (!altmap) most tail pages are reused. Consequently, the amount of unique struct pages is a lot smaller than the total amount of struct pages being mapped. The altmap path is left alone since it does not support memory savings based on compound pages devmap. Link: https://lkml.kernel.org/r/20220420155310.9712-6-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/sparse-vmemmap: improve memory savings for compound devmapsJoao Martins2022-04-282-10/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A compound devmap is a dev_pagemap with @vmemmap_shift > 0 and it means that pages are mapped at a given huge page alignment and utilize uses compound pages as opposed to order-0 pages. Take advantage of the fact that most tail pages look the same (except the first two) to minimize struct page overhead. Allocate a separate page for the vmemmap area which contains the head page and separate for the next 64 pages. The rest of the subsections then reuse this tail vmemmap page to initialize the rest of the tail pages. Sections are arch-dependent (e.g. on x86 it's 64M, 128M or 512M) and when initializing compound devmap with big enough @vmemmap_shift (e.g. 1G PUD) it may cross multiple sections. The vmemmap code needs to consult @pgmap so that multiple sections that all map the same tail data can refer back to the first copy of that data for a given gigantic page. On compound devmaps with 2M align, this mechanism lets 6 pages be saved out of the 8 necessary PFNs necessary to set the subsection's 512 struct pages being mapped. On a 1G compound devmap it saves 4094 pages. Altmap isn't supported yet, given various restrictions in altmap pfn allocator, thus fallback to the already in use vmemmap_populate(). It is worth noting that altmap for devmap mappings was there to relieve the pressure of inordinate amounts of memmap space to map terabytes of pmem. With compound pages the motivation for altmaps for pmem gets reduced. Link: https://lkml.kernel.org/r/20220420155310.9712-5-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/hugetlb_vmemmap: move comment block to Documentation/vmJoao Martins2022-04-281-167/+1
| | | | | | | | | | | | | | | | | | | | | In preparation for device-dax for using hugetlbfs compound page tail deduplication technique, move the comment block explanation into a common place in Documentation/vm. Link: https://lkml.kernel.org/r/20220420155310.9712-4-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/sparse-vmemmap: refactor core of vmemmap_populate_basepages() to helperJoao Martins2022-04-281-17/+36
| | | | | | | | | | | | | | | | | | | | | | | In preparation for describing a memmap with compound pages, move the actual pte population logic into a separate function vmemmap_populate_address() and have a new helper vmemmap_populate_range() walk through all base pages it needs to populate. While doing that, change the helper to use a pte_t* as return value, rather than an hardcoded errno of 0 or -ENOMEM. Link: https://lkml.kernel.org/r/20220420155310.9712-3-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/sparse-vmemmap: add a pgmap argument to section activationJoao Martins2022-04-283-12/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "sparse-vmemmap: memory savings for compound devmaps (device-dax)", v9. This series minimizes 'struct page' overhead by pursuing a similar approach as Muchun Song series "Free some vmemmap pages of hugetlb page" (now merged since v5.14), but applied to devmap with @vmemmap_shift (device-dax). The vmemmap dedpulication original idea (already used in HugeTLB) is to reuse/deduplicate tail page vmemmap areas, particular the area which only describes tail pages. So a vmemmap page describes 64 struct pages, and the first page for a given ZONE_DEVICE vmemmap would contain the head page and 63 tail pages. The second vmemmap page would contain only tail pages, and that's what gets reused across the rest of the subsection/section. The bigger the page size, the bigger the savings (2M hpage -> save 6 vmemmap pages; 1G hpage -> save 4094 vmemmap pages). This is done for PMEM /specifically only/ on device-dax configured namespaces, not fsdax. In other words, a devmap with a @vmemmap_shift. In terms of savings, per 1Tb of memory, the struct page cost would go down with compound devmap: * with 2M pages we lose 4G instead of 16G (0.39% instead of 1.5% of total memory) * with 1G pages we lose 40MB instead of 16G (0.0014% instead of 1.5% of total memory) The series is mostly summed up by patch 4, and to summarize what the series does: Patches 1 - 3: Minor cleanups in preparation for patch 4. Move the very nice docs of hugetlb_vmemmap.c into a Documentation/vm/ entry. Patch 4: Patch 4 is the one that takes care of the struct page savings (also referred to here as tail-page/vmemmap deduplication). Much like Muchun series, we reuse the second PTE tail page vmemmap areas across a given @vmemmap_shift On important difference though, is that contrary to the hugetlbfs series, there's no vmemmap for the area because we are late-populating it as opposed to remapping a system-ram range. IOW no freeing of pages of already initialized vmemmap like the case for hugetlbfs, which greatly simplifies the logic (besides not being arch-specific). altmap case unchanged and still goes via the vmemmap_populate(). Also adjust the newly added docs to the device-dax case. [Note that device-dax is still a little behind HugeTLB in terms of savings. I have an additional simple patch that reuses the head vmemmap page too, as a follow-up. That will double the savings and namespaces initialization.] Patch 5: Initialize fewer struct pages depending on the page size with DRAM backed struct pages -- because fewer pages are unique and most tail pages (with bigger vmemmap_shift). NVDIMM namespace bootstrap improves from ~268-358 ms to ~80-110/<1ms on 128G NVDIMMs with 2M and 1G respectivally. And struct page needed capacity will be 3.8x / 1071x smaller for 2M and 1G respectivelly. Tested on x86 with 1.5Tb of pmem (including pinning, and RDMA registration/deregistration scalability with 2M MRs) This patch (of 5): In support of using compound pages for devmap mappings, plumb the pgmap down to the vmemmap_populate implementation. Note that while altmap is retrievable from pgmap the memory hotplug code passes altmap without pgmap[*], so both need to be independently plumbed. So in addition to @altmap, pass @pgmap to sparse section populate functions namely: sparse_add_section section_activate populate_section_memmap __populate_section_memmap Passing @pgmap allows __populate_section_memmap() to both fetch the vmemmap_shift in which memmap metadata is created for and also to let sparse-vmemmap fetch pgmap ranges to co-relate to a given section and pick whether to just reuse tail pages from past onlined sections. While at it, fix the kdoc for @altmap for sparse_add_section(). [*] https://lore.kernel.org/linux-mm/20210319092635.6214-1-osalvador@suse.de/ Link: https://lkml.kernel.org/r/20220420155310.9712-1-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/20220420155310.9712-2-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*Muchun Song2022-04-284-7/+7
| | | | | | | | | | | | | The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". In this patch , cheanup configs to make code more expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: hugetlb_vmemmap: cleanup hugetlb_free_vmemmap_enabled*Muchun Song2022-04-282-6/+6
| | | | | | | | | | | | | The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". In this patch , cheanup the static key and hugetlb_free_vmemmap_enabled() to make code more expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-3-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: hugetlb_vmemmap: cleanup hugetlb_vmemmap related functionsMuchun Song2022-04-283-37/+35
| | | | | | | | | | | | | | | | | | | | | | | | | Patch series "cleanup hugetlb_vmemmap". The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize" is more clear. In this series, cheanup related codes to make it more clear and expressive. This is suggested by David. This patch (of 3): The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". And some function names are prefixed with "huge_page" instead of "hugetlb", it is easily to be confused with THP. In this patch, cheanup related functions to make code more clear and expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220404074652.68024-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/page_alloc.c: calc the right pfn if page size is not 4KMa Wupeng2022-04-281-1/+1
| | | | | | | | | | | | | | | | Previous 0x100000 is used to check the 4G limit in find_zone_movable_pfns_for_nodes(). This is right in x86 because the page size can only be 4K. But 16K and 64K are available in arm64. So replace it with PHYS_PFN(SZ_4G). Link: https://lkml.kernel.org/r/20220414101314.1250667-8-mawupeng1@huawei.com Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mremap: avoid unneeded do_munmap callMiaohe Lin2022-04-281-2/+2
| | | | | | | | | | | | When old_len == new_len, do_munmap will return -EINVAL due to len == 0. This errno will be simply ignored because of old_len != new_len check. So it is unnecessary to call do_munmap when old_len == new_len because nothing is actually done. Link: https://lkml.kernel.org/r/20220401081023.37080-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mremap: use helper mlock_future_check()Miaohe Lin2022-04-281-8/+2
| | | | | | | | | | Use helper mlock_future_check() to check whether it's safe to resize the locked_vm to simplify the code. Minor readability improvement. Link: https://lkml.kernel.org/r/20220322112004.27380-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mmap: drop arch_vm_get_page_pgprot()Anshuman Khandual2022-04-281-3/+1
| | | | | | | | | | | | | | | | | | | | | | There are no platforms left which use arch_vm_get_page_prot(). Just drop generic arch_vm_get_page_prot(). Link: https://lkml.kernel.org/r/20220414062125.609297-8-anshuman.khandual@arm.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mmap: drop arch_filter_pgprot()Anshuman Khandual2022-04-282-14/+2
| | | | | | | | | | | | | | | | | | | There are no platforms left which subscribe ARCH_HAS_FILTER_PGPROT. Hence drop generic arch_filter_pgprot() and also config ARCH_HAS_FILTER_PGPROT. Link: https://lkml.kernel.org/r/20220414062125.609297-7-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mmap: add new config ARCH_HAS_VM_GET_PAGE_PROTAnshuman Khandual2022-04-282-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm/mmap: Drop arch_vm_get_page_prot() and arch_filter_pgprot()", v7. protection_map[] is an array based construct that translates given vm_flags combination. This array contains page protection map, which is populated by the platform via [__S000 .. __S111] and [__P000 .. __P111] exported macros. Primary usage for protection_map[] is for vm_get_page_prot(), which is used to determine page protection value for a given vm_flags. vm_get_page_prot() implementation, could again call platform overrides arch_vm_get_page_prot() and arch_filter_pgprot(). Some platforms override protection_map[] that was originally built with __SXXX/__PXXX with different runtime values. Currently there are multiple layers of abstraction i.e __SXXX/__PXXX macros , protection_map[], arch_vm_get_page_prot() and arch_filter_pgprot() built between the platform and generic MM, finally defining vm_get_page_prot(). Hence this series proposes to drop later two abstraction levels and instead just move the responsibility of defining vm_get_page_prot() to the platform (still utilizing generic protection_map[] array) itself making it clean and simple. This first introduces ARCH_HAS_VM_GET_PAGE_PROT which enables the platforms to define custom vm_get_page_prot(). This starts converting platforms that define the overrides arch_filter_pgprot() or arch_vm_get_page_prot() which enables for those constructs to be dropped off completely. The series has been inspired from an earlier discuss with Christoph Hellwig https://lore.kernel.org/all/1632712920-8171-1-git-send-email-anshuman.khandual@arm.com/ This patch (of 7): Add a new config ARCH_HAS_VM_GET_PAGE_PROT, which when subscribed enables a given platform to define its own vm_get_page_prot() but still utilizing the generic protection_map[] array. Link: https://lkml.kernel.org/r/20220414062125.609297-1-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/20220414062125.609297-2-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mmap.c: use helper mlock_future_check()Miaohe Lin2022-04-281-9/+2
| | | | | | | | | Use helper mlock_future_check() to check whether it's safe to enlarge the locked_vm to simplify the code. Minor readability improvement. Link: https://lkml.kernel.org/r/20220402032231.64974-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mmap: clarify protection_map[] indicesAnshuman Khandual2022-04-281-2/+16
| | | | | | | | | | | | protection_map[] maps vm_flags access combinations into page protection value as defined by the platform via __PXXX and __SXXX macros. The array indices in protection_map[], represents vm_flags access combinations but it's not very intuitive to derive. This makes it clear and explicit. Link: https://lkml.kernel.org/r/20220404031840.588321-3-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/debug_vm_pgtable: drop protection_map[] usageAnshuman Khandual2022-04-281-12/+19
| | | | | | | | | | | | | | | | | | Patch series "mm: protection_map[] cleanups". This patch (of 2): Although protection_map[] contains the platform defined page protection map for a given vm_flags combination, vm_get_page_prot() is the right interface to use. This will also reduce dependency on protection_map[] which is going to be dropped off completely later on. Link: https://lkml.kernel.org/r/20220404031840.588321-1-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/20220404031840.588321-2-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mmu_gather: limit free batch count and add schedule point in ↵Jianxing Wang2022-04-281-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tlb_batch_pages_flush free a large list of pages maybe cause rcu_sched starved on non-preemptible kernels. howerver free_unref_page_list maybe can't cond_resched as it maybe called in interrupt or atomic context, especially can't detect atomic context in CONFIG_PREEMPTION=n. The issue is detected in guest with kvm cpu 200% overcommit, however I didn't see the warning in the host with the same application. I'm sure that the patch is needed for guest kernel, but no sure for host. To reproduce, set up two virtual machines in one host machine, per vm has the same number cpu and half memory of host. the run ltpstress.sh in per vm, then will see rcu stall warning.kernel is preempt disabled, append kernel command 'preempt=none' if enable dynamic preempt . It could detected in loongson machine(32 core, 128G mem) and ProLiant DL380 Gen9(x86 E5-2680, 28 core, 64G mem) tlb flush batch count depends on PAGE_SIZE, it's too large if PAGE_SIZE > 4K, here limit free batch count with 512. And add schedule point in tlb_batch_pages_flush. rcu: rcu_sched kthread starved for 5359 jiffies! g454793 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=19 [...] Call Trace: free_unref_page_list+0x19c/0x270 release_pages+0x3cc/0x498 tlb_flush_mmu_free+0x44/0x70 zap_pte_range+0x450/0x738 unmap_page_range+0x108/0x240 unmap_vmas+0x74/0xf0 unmap_region+0xb0/0x120 do_munmap+0x264/0x438 vm_munmap+0x58/0xa0 sys_munmap+0x10/0x20 syscall_common+0x24/0x38 Link: https://lkml.kernel.org/r/20220317072857.2635262-1-wangjianxing@loongson.cn Signed-off-by: Jianxing Wang <wangjianxing@loongson.cn> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mmap.c: use mmap_assert_write_locked() instead of open coding itRolf Eike Beer2022-04-281-2/+2
| | | | | | | | | In case the lock is actually not held at this point. Link: https://lkml.kernel.org/r/5827758.TJ1SttVevJ@mobilepool36.emlix.com Signed-off-by: Rolf Eike Beer <eb@emlix.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: simplify follow_invalidate_pte()Muchun Song2022-04-281-58/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only user (DAX) of range and pmdpp parameters of follow_invalidate_pte() is gone, it is safe to remove them and make it static to simlify the code. This is revertant of the following commits: 097963959594 ("mm: add follow_pte_pmd()") a4d1a8852513 ("dax: update to new mmu_notifier semantic") There is only one caller of the follow_invalidate_pte(). So just fold it into follow_pte() and remove it. Link: https://lkml.kernel.org/r/20220403053957.10770-7-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: pvmw: add support for walking devmap pagesMuchun Song2022-04-281-8/+9
| | | | | | | | | | | | | | | | | | | | | | | The devmap pages can not use page_vma_mapped_walk() to check if a huge devmap page is mapped into a vma. Add support for walking huge devmap pages so that DAX can use it in the next patch. Link: https://lkml.kernel.org/r/20220403053957.10770-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: rmap: introduce pfn_mkclean_range() to cleans PTEsMuchun Song2022-04-282-20/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | The page_mkclean_one() is supposed to be used with the pfn that has a associated struct page, but not all the pfns (e.g. DAX) have a struct page. Introduce a new function pfn_mkclean_range() to cleans the PTEs (including PMDs) mapped with range of pfns which has no struct page associated with them. This helper will be used by DAX device in the next patch to make pfns clean. Link: https://lkml.kernel.org/r/20220403053957.10770-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: rmap: fix cache flush on THP pagesMuchun Song2022-04-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "Fix some bugs related to ramp and dax", v7. Patch 1-2 fix a cache flush bug, because subsequent patches depend on those on those changes, there are placed in this series. Patch 3-4 are preparation for fixing a dax bug in patch 5. Patch 6 is code cleanup since the previous patch removes the usage of follow_invalidate_pte(). This patch (of 6): The flush_cache_page() only remove a PAGE_SIZE sized range from the cache. However, it does not cover the full pages in a THP except a head page. Replace it with flush_cache_range() to fix this issue. At least, no problems were found due to this. Maybe because the architectures that have virtual indexed caches is less. Link: https://lkml.kernel.org/r/20220403053957.10770-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220403053957.10770-2-songmuchun@bytedance.com Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alistair Popple <apopple@nvidia.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Hugh Dickins <hughd@google.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/madvise: fix potential pte_unmap_unlock pte errorMiaohe Lin2022-04-281-4/+4
| | | | | | | | | | | | | | | We can't assume pte_offset_map_lock will return same orig_pte value. So it's necessary to reacquire the orig_pte or pte_unmap_unlock will unmap the stale pte. Link: https://lkml.kernel.org/r/20220416081416.23304-1-linmiaohe@huawei.com Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD") Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: untangle config dependencies for demote-on-reclaimOscar Salvador2022-04-282-7/+6
| | | | | | | | | | | | | | | | | | At the time demote-on-reclaim was introduced, it was tied to CONFIG_HOTPLUG_CPU + CONFIG_MIGRATE, but that is not really accurate. The only two things we need to depend on are CONFIG_NUMA + CONFIG_MIGRATE, so clean this up. Furthermore, we only register the hotplug memory notifier when the system has CONFIG_MEMORY_HOTPLUG. Link: https://lkml.kernel.org/r/20220322224016.4574-1-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Suggested-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Abhishek Goel <huntbag@linux.vnet.ibm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: migrate: simplify the refcount validation when migrating hugetlb mappingBaolin Wang2022-04-281-5/+0
| | | | | | | | | | | | | | | | | | | | There is no need to validate the hugetlb page's refcount before trying to freeze the hugetlb page's expected refcount, instead we can just rely on the page_ref_freeze() to simplify the validation. Moreover we are always under the page lock when migrating the hugetlb page mapping, which means nowhere else can remove it from the page cache, so we can remove the xas_load() validation under the i_pages lock. Link: https://lkml.kernel.org/r/eb2fbbeaef2b1714097b9dec457426d682ee0635.1649676424.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/migration: fix possible do_pages_stat_array racing with memory offlineMiaohe Lin2022-04-281-2/+7
| | | | | | | | | | | | | | | When follow_page peeks a page, the page could be migrated and then be offlined while it's still being used by the do_pages_stat_array(). Use FOLL_GET to hold the page refcnt to fix this potential race. Link: https://lkml.kernel.org/r/20220318111709.60311-12-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/migration: fix potential invalid node access for reclaim-based migrationMiaohe Lin2022-04-281-3/+3
| | | | | | | | | | | | | | | | | If we failed to setup hotplug state callbacks for mm/demotion:online in some corner cases, node_demotion will be left uninitialized. Invalid node might be returned from the next_demotion_node() when doing reclaim-based migration. Use kcalloc to allocate node_demotion to fix the issue. Link: https://lkml.kernel.org/r/20220318111709.60311-11-linmiaohe@huawei.com Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes demotion") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/migration: fix potential page refcounts leak in migrate_pagesMiaohe Lin2022-04-281-0/+8
| | | | | | | | | | | | | | | | | | In -ENOMEM case, there might be some subpages of fail-to-migrate THPs left in thp_split_pages list. We should move them back to migration list so that they could be put back to the right list by the caller otherwise the page refcnt will be leaked here. Also adjust nr_failed and nr_thp_failed accordingly to make vm events account more accurate. Link: https://lkml.kernel.org/r/20220318111709.60311-10-linmiaohe@huawei.com Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/migration: remove some duplicated codes in migrate_pagesMiaohe Lin2022-04-281-24/+12
| | | | | | | | | | | | | | Remove the duplicated codes in migrate_pages to simplify the code. Minor readability improvement. No functional change intended. Link: https://lkml.kernel.org/r/20220318111709.60311-9-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/migration: avoid unneeded nodemask_t initializationMiaohe Lin2022-04-281-2/+2
| | | | | | | | | | | | | | | Avoid unneeded next_pass and this_pass initialization as they're always set before using to save possible cpu cycles when there are plenty of nodes in the system. Link: https://lkml.kernel.org/r/20220318111709.60311-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/migration: use helper macro min in do_pages_statMiaohe Lin2022-04-281-6/+2
| | | | | | | | | | | | | | We could use helper macro min to help set the chunk_nr to simplify the code. Link: https://lkml.kernel.org/r/20220318111709.60311-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/migration: use helper function vma_lookup() in add_page_for_migrationMiaohe Lin2022-04-281-2/+2
| | | | | | | | | | | | | | We could use helper function vma_lookup() to lookup the needed vma to simplify the code. Link: https://lkml.kernel.org/r/20220318111709.60311-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/migration: remove unneeded local variable page_lruMiaohe Lin2022-04-281-3/+1
| | | | | | | | | | | | | | We can use page_is_file_lru() directly to help account the isolated pages to simplify the code a bit. Link: https://lkml.kernel.org/r/20220318111709.60311-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/migration: remove unneeded local variable mapping_lockedMiaohe Lin2022-04-281-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "A few cleanup and fixup patches for migration", v2. This series contains a few patches to remove unneeded variables, jump label and use helper to simplify the code. Also we fix some bugs such as page refcounts leak , invalid node access and so on. More details can be found in the respective changelogs. This patch (of 11): When mapping_locked is true, TTU_RMAP_LOCKED is always set to ttu. We can check ttu instead so mapping_locked can be removed. And ttu is either 0 or TTU_RMAP_LOCKED now. Change '|=' to '=' to reflect this. Link: https://lkml.kernel.org/r/20220318111709.60311-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220318111709.60311-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/mempolicy: clean up the code logic in queue_pages_pte_rangeMiaohe Lin2022-04-281-9/+3
| | | | | | | | | | | | | | | Since commit e5947d23edd8 ("mm: mempolicy: don't have to split pmd for huge zero page"), THP is never splited in queue_pages_pmd. Thus 2 is never returned now. We can remove such unnecessary ret != 2 check and clean up the relevant comment. Minor improvements in readability. Link: https://lkml.kernel.org/r/20220419122234.45083-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: compaction: use helper isolation_suitable()Miaohe Lin2022-04-281-1/+1
| | | | | | | | | | | Use helper isolation_suitable() to check whether page is suitable to isolate to simplify the code. Minor readability improvement. Link: https://lkml.kernel.org/r/20220322110750.60311-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/z3fold: remove unneeded PAGE_HEADLESS check in free_handle()Miaohe Lin2022-04-281-3/+0
| | | | | | | | | | The only caller z3fold_free() never calls free_handle() in PAGE_HEADLESS case. Remove this unneeded check. Link: https://lkml.kernel.org/r/20220308134311.59086-9-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/z3fold: remove redundant list_del_init of zhdr->buddy in z3fold_freeMiaohe Lin2022-04-281-3/+0
| | | | | | | | | | do_compact_page() will do list_del_init(&zhdr->buddy) for us. Remove this extra one to save some possible cpu cycles. Link: https://lkml.kernel.org/r/20220308134311.59086-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/z3fold: move decrement of pool->pages_nr into __release_z3fold_page()Miaohe Lin2022-04-281-29/+12
| | | | | | | | | | | | | | | | | | | | The z3fold will always do atomic64_dec(&pool->pages_nr) when the __release_z3fold_page() is called. Thus we can move decrement of pool->pages_nr into __release_z3fold_page() to simplify the code. Also we can reduce the size of z3fold.o ~1k. Without this patch: text data bss dec hex filename 15444 1376 8 16828 41bc mm/z3fold.o With this patch: text data bss dec hex filename 15044 1248 8 16300 3fac mm/z3fold.o Link: https://lkml.kernel.org/r/20220308134311.59086-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/z3fold: remove confusing local variable l reassignmentMiaohe Lin2022-04-281-1/+0
| | | | | | | | | | The local variable l holds the address of unbuddied[i] which won't change after we take the pool lock. Remove it to avoid confusion. Link: https://lkml.kernel.org/r/20220308134311.59086-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/z3fold: remove unneeded page_mapcount_reset and ClearPagePrivateMiaohe Lin2022-04-281-3/+0
| | | | | | | | | | | Page->page_type and PagePrivate are not used in z3fold. We should remove these confusing unneeded operations. The z3fold do these here is due to referring to zsmalloc's migration code which does need these operations. Link: https://lkml.kernel.org/r/20220308134311.59086-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/z3fold: minor clean up for z3fold_freeMiaohe Lin2022-04-281-4/+4
| | | | | | | | | | Use put_z3fold_header() to pair with get_z3fold_header. Also fix the wrong comments. Minor readability improvement. Link: https://lkml.kernel.org/r/20220308134311.59086-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/z3fold: remove obsolete comment in z3fold_allocMiaohe Lin2022-04-281-3/+0
| | | | | | | | | | The highmem pages are supported since commit f1549cb5ab2b ("mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"). Remove the residual comment. Link: https://lkml.kernel.org/r/20220308134311.59086-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/z3fold: declare z3fold_mount with __initMiaohe Lin2022-04-281-1/+1
| | | | | | | | | | | | | | | | | | | | Patch series "A few cleanup patches for z3fold", v2. This series contains a few patches to simplify the code, remove unneeded code, fix obsolete comment and so on. More details can be found in the respective changelogs. This patch (of 8): z3fold_mount is only called during init. So we should declare it with __init. Link: https://lkml.kernel.org/r/20220308134311.59086-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220308134311.59086-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vmscan: fix comment for isolate_lru_pagesMiaohe Lin2022-04-281-2/+2
| | | | | | | | | | | | Since commit 791b48b64232 ("mm: vmscan: scan until it finds eligible pages"), splicing any skipped pages to the tail of the LRU list won't put the system at risk of premature OOM but will waste lots of cpu cycles. Correct the comment accordingly. Link: https://lkml.kernel.org/r/20220416025231.8082-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vmscan: fix comment for current_may_throttleMiaohe Lin2022-04-281-4/+3
| | | | | | | | | | | Since commit 6d6435811c19 ("remove bdi_congested() and wb_congested() and related functions"), there is no congested backing device check anymore. Correct the comment accordingly. [akpm@linux-foundation.org: tweak grammar] Link: https://lkml.kernel.org/r/20220414120202.30082-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vmscan: remove obsolete comment in get_scan_countMiaohe Lin2022-04-281-3/+1
| | | | | | | | | | Since commit 1431d4d11abb ("mm: base LRU balancing on an explicit cost model"), the relative value of each set of LRU lists is based on cost model instead of rotated/scanned ratio. Cleanup the relevant comment. Link: https://lkml.kernel.org/r/20220409030245.61211-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vmscan: sc->reclaim_idx must be a valid zone indexWei Yang2022-04-281-2/+2
| | | | | | | | | | | lruvec_lru_size() is only used in get_scan_count(), so the only possible zone_idx is sc->reclaim_idx. Since sc->reclaim_idx is ensured to be a valid zone idex, we can remove the extra check for zone iteration. Link: https://lkml.kernel.org/r/20220317234624.23358-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>