summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* mm, hugetlb, hwpoison: separate branch for free and in-use hugepageNaoya Horiguchi2022-04-282-2/+6
| | | | | | | | | | | | | | | We know that HPageFreed pages should have page refcount 0, so get_page_unless_zero() always fails and returns 0. So explicitly separate the branch based on page state for minor optimization and better readability. Link: https://lkml.kernel.org/r/20220415041848.GA3034499@ik1-406-35019.vs.sakura.ne.jp Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> Suggested-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/memory-failure.c: dissolve truncated hugetlb pageMiaohe Lin2022-04-281-5/+4
| | | | | | | | | | | | | | | If me_huge_page meets a truncated but not yet freed hugepage, it won't be dissolved even if we hold the last refcnt. It's because the hugepage has NULL page_mapping while it's not anonymous hugepage too. Thus we lose the last chance to dissolve it into buddy to save healthy subpages. Remove PageAnon check to handle these hugepages too. Link: https://lkml.kernel.org/r/20220414114941.11223-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/memory-failure.c: minor cleanup for HWPoisonHandlableMiaohe Lin2022-04-281-5/+3
| | | | | | | | | | | | | | | | | | | | | | Patch series "A few fixup and cleanup patches for memory failure", v2. This series contains a patch to clean up the HWPoisonHandlable and another one to dissolve truncated hugetlb page. More details can be found in the respective changelogs. This patch (of 2): The local variable movable can be removed by returning true directly. Also fix typo 'mirgate'. No functional change intended. Link: https://lkml.kernel.org/r/20220414114941.11223-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220414114941.11223-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Revert "mm/memory-failure.c: fix race with changing page compound again"Naoya Horiguchi2022-04-283-13/+0
| | | | | | | | | | | | | | | | | Reverts commit 888af2701db7 ("mm/memory-failure.c: fix race with changing page compound again") because now we fetch the page refcount under hugetlb_lock in try_memory_failure_hugetlb() so that the race check is no longer necessary. Link: https://lkml.kernel.org/r/20220408135323.1559401-4-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Suggested-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/hwpoison: put page in already hwpoisoned case with MF_COUNT_INCREASEDNaoya Horiguchi2022-04-281-0/+2
| | | | | | | | | | | | | | | | | In already hwpoisoned case, memory_failure() is supposed to return with releasing the page refcount taken for error handling. But currently the refcount is not released when called with MF_COUNT_INCREASED, which makes page refcount inconsistent. This should be rare and non-critical, but it might be inconvenient in testing (unpoison doesn't work). Link: https://lkml.kernel.org/r/20220408135323.1559401-3-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Suggested-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/memory-failure.c: remove unnecessary (void*) conversionsliqiong2022-04-281-2/+2
| | | | | | | | | | No need cast (void*) to (struct hwp_walk*). Link: https://lkml.kernel.org/r/20220322142826.25939-1-liqiong@nfschina.com Signed-off-by: liqiong <liqiong@nfschina.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: wrap __find_buddy_pfn() with a necessary buddy page validationZi Yan2022-04-283-77/+101
| | | | | | | | | | | | | | | | | | | | | Whenever the buddy of a page is found from __find_buddy_pfn(), page_is_buddy() should be used to check its validity. Add a helper function find_buddy_page_pfn() to find the buddy page and do the check together. [ziy@nvidia.com: updates per David] Link: https://lkml.kernel.org/r/20220401230804.1658207-2-zi.yan@sent.com Link: https://lore.kernel.org/linux-mm/CAHk-=wji_AmYygZMTsPMdJ7XksMt7kOur8oDfDdniBRMjm4VkQ@mail.gmail.com/ Link: https://lkml.kernel.org/r/7236E7CA-B5F1-4C04-AB85-E86FA3E9A54B@nvidia.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: page_alloc: simplify pageblock migratetype check in __free_one_page()Zi Yan2022-04-281-29/+17
| | | | | | | | | | | | | | | | | | Move pageblock migratetype check code in the while loop to simplify the logic. It also saves redundant buddy page checking code. Link: https://lkml.kernel.org/r/20220401230804.1658207-1-zi.yan@sent.com Link: https://lore.kernel.org/linux-mm/27ff69f9-60c5-9e59-feb2-295250077551@suse.cz/ Signed-off-by: Zi Yan <ziy@nvidia.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/page_alloc: adding same penalty is enough to get round-robin orderWei Yang2022-04-281-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To make node order in round-robin in the same distance group, we add a penalty to the first node we got in each round. To get a round-robin order in the same distance group, we don't need to decrease the penalty since: * find_next_best_node() always iterates node in the same order * distance matters more then penalty in find_next_best_node() * in nodes with the same distance, the first one would be picked up So it is fine to increase same penalty when we get the first node in the same distance group. Since we just increase a constance of 1 to node penalty, it is not necessary to multiply MAX_NODE_LOAD for preference. [richard.weiyang@gmail.com: remove remove MAX_NODE_LOAD, per Vlastimil] Link: https://lkml.kernel.org/r/20220412001319.7462-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20220123013537.20491-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Krupa Ramakrishnan <krupa.ramakrishnan@amd.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Documentation/sysctl: document page_lock_unfairnessJoel Savitz2022-04-281-0/+9
| | | | | | | | | | | | | | | | | | | | commit 5ef64cc8987a ("mm: allow a controlled amount of unfairness in the page lock") introduced a new systctl but no accompanying documentation. Add a simple entry to the documentation. Link: https://lkml.kernel.org/r/20220325164437.120246-1-jsavitz@redhat.com Signed-off-by: Joel Savitz <jsavitz@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: "zhangyi (F)" <yi.zhang@huawei.com> Cc: Charan Teja Reddy <charante@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* vmap(): don't allow invalid pagesYury Norov2022-04-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vmap() takes struct page *pages as one of arguments, and user may provide an invalid pointer which may lead to corrupted translation table. An example of such behaviour is erroneous usage of virt_to_page(): vaddr1 = dma_alloc_coherent() page = virt_to_page() // Wrong here ... vaddr2 = vmap(page) memset(vaddr2) // Faulting here virt_to_page() returns a wrong pointer if vaddr1 is not a linear kernel address. The problem is that vmap() populates pte with bad pfn successfully, and it's much harder to debug at memory access time. This case should be caught by DEBUG_VIRTUAL being that enabled, but it's not enabled in popular distros. Kernel already checks the pages against NULL. In the case mentioned above, however, the address is not NULL, and it's big enough so that the hardware generated Address Size Abort on arm64: [ 665.484101] Unhandled fault at 0xffff8000252cd000 [ 665.488807] Mem abort info: [ 665.491617] ESR = 0x96000043 [ 665.494675] EC = 0x25: DABT (current EL), IL = 32 bits [ 665.499985] SET = 0, FnV = 0 [ 665.503039] EA = 0, S1PTW = 0 [ 665.506167] Data abort info: [ 665.509047] ISV = 0, ISS = 0x00000043 [ 665.512882] CM = 0, WnR = 1 [ 665.515851] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000818cb000 [ 665.522550] [ffff8000252cd000] pgd=000000affcfff003, pud=000000affcffe003, pmd=0000008fad8c3003, pte=00688000a5217713 [ 665.533160] Internal error: level 3 address size fault: 96000043 [#1] SMP [ 665.539936] Modules linked in: [...] [ 665.616212] CPU: 178 PID: 13199 Comm: test Tainted: P OE 5.4.0-84-generic #94~18.04.1-Ubuntu [ 665.626806] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.0.6 07/10/2018 [ 665.636618] pstate: 80400009 (Nzcv daif +PAN -UAO) [ 665.641407] pc : __memset+0x38/0x188 [ 665.645146] lr : test+0xcc/0x3f8 [ 665.650184] sp : ffff8000359bb840 [ 665.653486] x29: ffff8000359bb840 x28: 0000000000000000 [ 665.658785] x27: 0000000000000000 x26: 0000000000231000 [ 665.664083] x25: ffff00ae660f6110 x24: ffff00ae668cb800 [ 665.669382] x23: 0000000000000001 x22: ffff00af533e5000 [ 665.674680] x21: 0000000000001000 x20: 0000000000000000 [ 665.679978] x19: ffff00ae66950000 x18: ffffffffffffffff [ 665.685276] x17: 00000000588636a5 x16: 0000000000000013 [ 665.690574] x15: ffffffffffffffff x14: 000000000007ffff [ 665.695872] x13: 0000000080000000 x12: 0140000000000000 [ 665.701170] x11: 0000000000000041 x10: ffff8000652cd000 [ 665.706468] x9 : ffff8000252cf000 x8 : ffff8000252cd000 [ 665.711767] x7 : 0303030303030303 x6 : 0000000000001000 [ 665.717065] x5 : ffff8000252cd000 x4 : 0000000000000000 [ 665.722363] x3 : ffff8000252cdfff x2 : 0000000000000001 [ 665.727661] x1 : 0000000000000003 x0 : ffff8000252cd000 [ 665.732960] Call trace: [ 665.735395] __memset+0x38/0x188 [...] Interestingly, this abort happens even if copy_from_kernel_nofault() is used, which is quite inconvenient for debugging purposes. This patch adds a pfn_valid() check into vmap() path, so that invalid mapping will not be created; WARN_ON() is used to let client code know that something goes wrong, and it's not a regular EINVAL situation. Link: https://lkml.kernel.org/r/20220422220410.1308706-1-yury.norov@gmail.com Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Alexey Klimov <aklimov@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/vmalloc: fix a commentYixuan Cao2022-04-281-1/+1
| | | | | | | | | | | | | | | | | | | | The sentence "but the mempolcy want to alloc memory by interleaving" should be rephrased with "but the mempolicy wants to alloc memory by interleaving" where "mempolicy" is a struct name. This work is coauthored by Yinan Zhang Jiajian Ye Shenghong Han Chongxi Zhao Yuhong Feng Yongqiang Liu Link: https://lkml.kernel.org/r/20220401064543.4447-1-caoyixuan2019@email.szu.edu.cn Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/memcontrol.c: remove unused private flag of memory.oom_controlLu Jialin2022-04-281-4/+0
| | | | | | | | | | | | | There is no use for the private value, __OOM_TYPE and OOM notifier OOM_CONTROL. Therefore remove them to make the code clean. Link: https://lkml.kernel.org/r/20220421122755.40899-1-lujialin4@huawei.com Signed-off-by: Lu Jialin <lujialin4@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/memcontrol.c: make cgroup_memory_noswap staticLu Jialin2022-04-282-5/+1
| | | | | | | | | | | | | | cgroup_memory_noswap is only used in mm/memcontrol.c, therefore just make it static, and remove export in include/linux/memcontrol.h Link: https://lkml.kernel.org/r/20220421124736.62180-1-lujialin4@huawei.com Signed-off-by: Lu Jialin <lujialin4@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* MAINTAINERS: add corresponding kselftests to memcg entryRoman Gushchin2022-04-281-0/+2
| | | | | | | | | | | | | | | | List memory control and kernel memory control kselftests in the memory resource controller entry. Link: https://lkml.kernel.org/r/20220415000133.3955987-5-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Chris Down <chris@chrisdown.name> Cc: David Vernet <void@manifault.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* MAINTAINERS: add corresponding kselftests to cgroup entryRoman Gushchin2022-04-281-0/+1
| | | | | | | | | | | | | | | | List cgroup kselftests in the cgroup MAINTAINERS entry. These are tests covering core, freezer and cgroup.kill functionality. Link: https://lkml.kernel.org/r/20220415000133.3955987-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Chris Down <chris@chrisdown.name> Cc: David Vernet <void@manifault.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* kselftests: memcg: speed up the memory.high testRoman Gushchin2022-04-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high") allocating memory over memory.high became very time consuming. But it's exactly what the memory.high test from cgroup kselftests is doing: it tries to allocate 100M with 30M memory.high value. It takes forever to complete. In order to keep it passing (or failing) in a reasonable amount of time let's try to allocate only a little over 30M: 31M to be precise. With this change test_memcontrol finishes in a reasonable amount of time: $ time ./test_memcontrol ok 1 test_memcg_subtree_control ok 2 test_memcg_current ok 3 test_memcg_min ok 4 test_memcg_low ok 5 test_memcg_high ok 6 test_memcg_max ok 7 test_memcg_oom_events ok 8 test_memcg_swap_max ok 9 test_memcg_sock ok 10 test_memcg_oom_group_leaf_events ok 11 test_memcg_oom_group_parent_events ok 12 test_memcg_oom_group_score_events real 0m2.273s user 0m0.064s sys 0m0.739s Link: https://lkml.kernel.org/r/20220415000133.3955987-3-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: David Vernet <void@manifault.com> Cc: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* kselftests: memcg: update the oom group leaf events testRoman Gushchin2022-04-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm: memcg kselftests fixes". This patch (of 4): Commit 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events") made memory.events recursive: all events are propagated upwards by the tree. It was a change in semantics. It broke the oom group leaf events test: it assumes that after an OOM the oom_kill counter is zero on parent's level. Let's adjust the test: it should have similar expectations for the child and parent levels. The test passes after this fix. Link: https://lkml.kernel.org/r/20220415000133.3955987-2-roman.gushchin@linux.dev Link: https://lkml.kernel.org/r/20220415000133.3955987-1-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: David Vernet <void@manifault.com> Cc: Chris Down <chris@chrisdown.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/memcg: non-hierarchical mode is deprecatedWei Yang2022-04-281-3/+0
| | | | | | | | | | | | | | After commit bef8620cd8e0 ("mm: memcg: deprecate the non-hierarchical mode"), we won't have a NULL parent except root_mem_cgroup. And this case is handled when (memcg == root). Link: https://lkml.kernel.org/r/20220403020833.26164-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/memcg: move generation assignment and comparison togetherWei Yang2022-04-281-3/+7
| | | | | | | | | | | | | | | For each round-trip, we assign generation on first invocation and compare it on subsequent invocations. Let's move them together to make it more self-explaining. Also this reduce a check on prev. [hannes@cmpxchg.org: better comment to explain reclaim model] Link: https://lkml.kernel.org/r/20220330234719.18340-4-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/memcg: set pos explicitly for reclaim and !reclaimWei Yang2022-04-281-3/+2
| | | | | | | | | | | | | During mem_cgroup_iter, there are two ways to get iteration position: reclaim vs non-reclaim mode. Let's do it explicitly for reclaim vs non-reclaim mode. Link: https://lkml.kernel.org/r/20220330234719.18340-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/memcg: set memcg after css verified and got referenceWei Yang2022-04-281-8/+3
| | | | | | | | | | | | | | | | | | | | | | | Patch series "mm/memcg: some cleanup for mem_cgroup_iter()", v2. No functional change, try to make it more readable. This patch (of 3): Instead of resetting memcg when css is either not verified or not got reference, we can set it after these process. No functional change, just simplified the code a little. Link: https://lkml.kernel.org/r/20220330234719.18340-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20220330234719.18340-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Michal Hocko <mhocko@suse.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/memcg: mz already removed from rb_tree if not NULLWei Yang2022-04-281-1/+0
| | | | | | | | | | | | | | | When mz is not NULL, it means mz can either come from mem_cgroup_largest_soft_limit_node or __mem_cgroup_largest_soft_limit_node. And both of them have removed this node by __mem_cgroup_remove_exceeded(). Not necessary to call __mem_cgroup_remove_exceeded() again. [mhocko@suse.com: refine changelog] Link: https://lkml.kernel.org/r/20220314233030.12334-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/memcg: remove unneeded nr_scannedMiaohe Lin2022-04-281-4/+1
| | | | | | | | | | | | | | | The local variable nr_scanned is unneeded as mem_cgroup_soft_reclaim always does *total_scanned += nr_scanned. So we can pass total_scanned directly to the mem_cgroup_soft_reclaim to simplify the code and save some cpu cycles of adding nr_scanned to total_scanned. Link: https://lkml.kernel.org/r/20220328114144.53389-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: shmem: make shmem_init return voidMiaohe Lin2022-04-282-7/+4
| | | | | | | | | | | | The return value of shmem_init is never used. So we can make it return void now. [akpm@linux-foundation.org: remove `return;' from void-returning function, per Muchun Song] Link: https://lkml.kernel.org/r/20220328112707.22217-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm: rework calculation of bdi_min_ratio in bdi_set_min_ratioChen Wandun2022-04-281-5/+12
| | | | | | | | | | | | In function bdi_set_min_ratio, min_ratio is unsigned int, it will result underflow when setting min_ratio below bdi->min_ratio, it is confusing. Rework it, no functional change. Link: https://lkml.kernel.org/r/20220422095159.2858305-1-chenwandun@huawei.com Signed-off-by: Chen Wandun <chenwandun@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* tools/vm/page_owner_sort.c: avoid repeated judgmentsYixuan Cao2022-04-281-4/+2
| | | | | | | | | | | | | | | | | | | | | I noticed a detail that needs to be adjusted. When judging whether a page is allocated by vmalloc, the value of the variable "tmp" was repeatedly judged, so the code was adjusted. This work is coauthored by Yinan Zhang, Jiajian Ye, Shenghong Han, Chongxi Zhao, Yuhong Feng and Yongqiang Liu. Link: https://lkml.kernel.org/r/20220414042744.13896-1-caoyixuan2019@email.szu.edu.cn Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Haowen Bai <baihaowen@meizu.com> Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Sean Anderson <seanga2@gmail.com> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* tools/vm/page_owner_sort.c: provide allocator labelling and update --cull ↵Yixuan Cao2022-04-282-16/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and --sort options An application is suspected of having memory leak when its memory consumption is high and keeps increasing. There are several commonly used memory allocators: slab, cma, vmalloc, etc. The memory leak identification can be sped up if the page information allocated by an allocator can be analyzed separately. This patch provides supports for memory allocator labelling for slab, vmalloc, and cma. The pages allocated by slab and cma can be confirmed from the "PFN" line according to the kernel codes, and the label of the vmalloc allocator can be obtained by analyzing the stack trace. Thanks for Vlastimil Babka's constructive suggestions. Based on Yinan Zhang's study, the call chain of vmalloc() is vmalloc() -> ... -> __vmalloc_node_range() -> __vmalloc_area_node(). __vmalloc_area_node() requests memory through the interface of buddy allocation system. In the current version, __vmalloc_area_node() uses four interfaces: alloc_pages_bulk_array_mempolicy(), alloc_pages_bulk_array_node(), alloc_pages() and alloc_pages_node(). By disassembling the code, we find that __vmalloc_area_node() is expanded in __vmalloc_node_range(). So __vmalloc_area_node is not in the stack trace. On the test machine, the stack trace of pages allocated by vmalloc has the following four forms: __alloc_pages_bulk+0x230/0x6a0 __vmalloc_node_range+0x19c/0x598 alloc_pages_bulk_array_mempolicy+0xbc/0x278 __vmalloc_node_range+0x1e8/0x598 __alloc_pages+0x160/0x2b0 __vmalloc_node_range+0x234/0x598 alloc_pages+0xac/0x150 __vmalloc_node_range+0x44c/0x598 Therefore, in two consecutive lines of stacktrace, if the first line contains the word "alloc_pages" and the second line contains the word "__vmalloc_node_range", it can be determined that the page is allocated by vmalloc. And the function offset and size are not the same on different machines, so there is no need to match them. At the same time, this patch updates the --cull and --sort options to support allocator-based merge statistics and sorting. The added functions are fully compatible with the original work. When using, you can use "allocator", or abbreviated as "ator". Relevant updates have also been made in the documentation(Documentation/vm/page_owner.rst). Example: ./page_owner_sort <input> <output> --cull=st,pid,name,allocator ./page_owner_sort <input> <output> --sort=ator,pid,name This work is coauthored by Jiajian Ye, Yinan Zhang, Shenghong Han, Chongxi Zhao, Yuhong Feng and Yongqiang Liu. Link: https://lkml.kernel.org/r/20220410132932.9402-1-caoyixuan2019@email.szu.edu.cn Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Haowen Bai <baihaowen@meizu.com> Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Sean Anderson <seanga2@gmail.com> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* tools/vm/page_owner: support debug log to avoid huge log printHaowen Bai2022-04-281-8/+20
| | | | | | | | | | | | | | | | | | As normal usage, tool will print huge parser log and spend a lot of time printing, so it would be preferable add "-d" debug control to avoid this problem. Link: https://lkml.kernel.org/r/1649672446-5685-1-git-send-email-baihaowen@meizu.com Signed-off-by: Haowen Bai <baihaowen@meizu.com> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Cc: Sean Anderson <seanga2@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* tools/vm/page_owner_sort.c: support sorting blocks by multiple keysJiajian Ye2022-04-282-23/+165
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When viewing page owner information, we may want to sort blocks of information by multiple keys, since one single key does not uniquely identify a block. Therefore, following adjustments are made: 1. Add a new --sort option to support sorting blocks of information by multiple keys. ./page_owner_sort <input> <output> --sort=<order> ./page_owner_sort <input> <output> --sort <order> <order> is a single argument in the form of a comma-separated list, which offers a way to specify sorting order. Sorting syntax is [+|-]key[,[+|-]key[,...]]. The ascending or descending order can be specified by adding the + (ascending, default) or - (descend -ing) prefix to the key: ./page_owner_sort <input> <output> [option] --sort -key1,+key2,key3... For example, to sort the blocks first by task command name in lexicographic order and then by pid in ascending numerical order, use the following: ./page_owner_sort <input> <output> --sort=name,+pid To sort the blocks first by pid in ascending order and then by timestamp of the page when it is allocated in descending order, use the following: ./page_owner_sort <input> <output> --sort=pid,-alloc_ts 2. Add explanations of a newly added --sort option in the function usage() and the document(Documentation/vm/page_owner.rst). This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng Yongqiang Liu Link: https://lkml.kernel.org/r/20220401024856.767-3-yejiajian2018@email.szu.edu.cn Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Cc: Haowen Bai <baihaowen@meizu.com> Cc: Sean Anderson <seanga2@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* tools/vm/page_owner_sort.c: support for multi-value selection in single argumentJiajian Ye2022-04-282-26/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When viewing page owner information, we may want to select blocks whose PID/TGID/TASK_COMM_NAME appears in a user-specified list for data analysis and aggregation. But currently page_owner_sort only supports selecting blocks associated with only one specified PID/TGID/TASK_COMM_NAME. Therefore, following adjustments are made to fix the problem: 1. Enhance selecting function to support the selection of multiple PIDs/TGIDs/TASK_COMM_NAMEs. The enhanced usages are as follows: --pid <pidlist> Select by pid. This selects the blocks whose PID numbers appear in <pidlist>. --tgid <tgidlist> Select by tgid. This selects the blocks whose TGID numbers appear in <tgidlist>. --name <cmdlist> Select by task command name. This selects the blocks whose task command name appear in <cmdlist>. Where <pidlist>, <tgidlist>, <cmdlist> are single arguments in the form of a comma-separated list,which offers a way to specify individual selecting rules. For example, if you want to select blocks whose tgids are 1, 2 or 3, you have to use 4 commands as follows: ./page_owner_sort <input> <output1> --tgid=1 ./page_owner_sort <input> <output2> --tgid=2 ./page_owner_sort <input> <output3> --tgid=3 cat <output1> <output2> <output3> > <output> With this patch, you can use only 1 command to obtain the same result as above: ./page_owner_sort <input> <output1> --tgid=1,2,3 2. Update explanations of --pid, --tgid and --name in the function usage() and the document(Documents/vm/page_owner.rst). This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng Yongqiang Liu Link: https://lkml.kernel.org/r/20220401024856.767-2-yejiajian2018@email.szu.edu.cn Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Cc: Haowen Bai <baihaowen@meizu.com> Cc: Sean Anderson <seanga2@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* tools/vm/page_owner_sort.c: use fprintf() to send error messages to stderrJiajian Ye2022-04-281-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | Error messages should be send to stderr using fprintf() instead of printf(). This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng Yongqiang Liu Link: https://lkml.kernel.org/r/20220401024856.767-1-yejiajian2018@email.szu.edu.cn Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Haowen Bai <baihaowen@meizu.com> Cc: Sean Anderson <seanga2@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Merge tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2022-04-2814-155/+195
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "Another relatively quiet week, amdgpu leads the way, some i915 display fixes, and a single sunxi fix. amdgpu: - Runtime pm fix - DCN memory leak fix in error path - SI DPM deadlock fix - S0ix fix amdkfd: - GWS fix - GWS support for CRIU i915: - Fix #5284: Backlight control regression on XMG Core 15 e21 - Fix black display plane on Acer One AO532h - Two smaller display fixes sunxi: - Single fix removing applying PHYS_OFFSET twice" * tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend drm/amd/pm: fix the deadlock issue observed on SI drm/amd/display: Fix memory leak in dcn21_clock_source_create drm/amdgpu: don't runtime suspend if there are displays attached (v3) drm/amdkfd: CRIU add support for GWS queues drm/amdkfd: Fix GWS queue count drm/sun4i: Remove obsolete references to PHYS_OFFSET drm/i915/fbc: Consult hw.crtc instead of uapi.crtc drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addresses drm/i915: Check EDID for HDR static metadata when choosing blc drm/i915: Fix DISP_POS_Y and DISP_HEIGHT defines
| * Merge tag 'amd-drm-fixes-5.18-2022-04-27' of ↵Dave Airlie2022-04-2910-140/+165
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.18-2022-04-27: amdgpu: - Runtime pm fix - DCN memory leak fix in error path - SI DPM deadlock fix - S0ix fix amdkfd: - GWS fix - GWS support for CRIU Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220428023232.5794-1-alexander.deucher@amd.com
| | * drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspendPrike Liang2022-04-271-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without MMHUB clock gating being enabled then MMHUB will not disconnect from DF and will result in DF C-state entry can't be accessed during S2idle suspend, and eventually s0ix entry will be blocked. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | * drm/amd/pm: fix the deadlock issue observed on SIEvan Quan2022-04-274-55/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The adev->pm.mutx is already held at the beginning of amdgpu_dpm_compute_clocks/amdgpu_dpm_enable_uvd/amdgpu_dpm_enable_vce. But on their calling path, amdgpu_display_bandwidth_update will be called and thus its sub functions amdgpu_dpm_get_sclk/mclk. They will then try to acquire the same adev->pm.mutex and deadlock will occur. By placing amdgpu_display_bandwidth_update outside of adev->pm.mutex protection(considering logically they do not need such protection) and restructuring the call flow accordingly, we can eliminate the deadlock issue. This comes with no real logics change. Fixes: 3712e7a49459 ("drm/amd/pm: unified lock protections in amdgpu_dpm.c") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Arthur Marsh <arthur.marsh@internode.on.net> Link: https://lore.kernel.org/all/9e689fea-6c69-f4b0-8dee-32c4cf7d8f9c@molgen.mpg.de/ BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1957 Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | * drm/amd/display: Fix memory leak in dcn21_clock_source_createMiaoqian Lin2022-04-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | When dcn20_clk_src_construct() fails, we need to release clk_src. Fixes: 6f4e6361c3ff ("drm/amd/display: Add Renoir resource (v2)") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | * drm/amdgpu: don't runtime suspend if there are displays attached (v3)Alex Deucher2022-04-271-35/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We normally runtime suspend when there are displays attached if they are in the DPMS off state, however, if something wakes the GPU we send a hotplug event on resume (in case any displays were connected while the GPU was in suspend) which can cause userspace to light up the displays again soon after they were turned off. Prior to commit 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's."), the driver took a runtime pm reference when the fbdev emulation was enabled because we didn't implement proper shadowing support for vram access when the device was off so the device never runtime suspended when there was a console bound. Once that commit landed, we now utilize the core fb helper implementation which properly handles the emulation, so runtime pm now suspends in cases where it did not before. Ultimately, we need to sort out why runtime suspend in not working in this case for some users, but this should restore similar behavior to before. v2: move check into runtime_suspend v3: wake ups -> wakeups in comment, retain pm_runtime behavior in runtime_idle callback Fixes: 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") Link: https://lore.kernel.org/r/20220403132322.51c90903@darkstar.example.org/ Tested-by: Michele Ballabio <ballabio.m@gmail.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
| | * drm/amdkfd: CRIU add support for GWS queuesDavid Yat Sin2022-04-272-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | Add support to checkpoint/restore GWS (Global Wave Sync) queues. Signed-off-by: David Yat Sin <david.yatsin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | * drm/amdkfd: Fix GWS queue countDavid Yat Sin2022-04-271-46/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dqm->gws_queue_count and pdd->qpd.mapped_gws_queue need to be updated each time the queue gets evicted. Fixes: b8020b0304c8 ("drm/amdkfd: Enable over-subscription with >1 GWS queue") Signed-off-by: David Yat Sin <david.yatsin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| * | Merge tag 'drm-intel-fixes-2022-04-28' of ↵Dave Airlie2022-04-293-12/+30
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Fix #5284: Backlight control regression on XMG Core 15 e21 - Fix black display plane on Acer One AO532h - Two smaller display fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Ymotel5VfZUrJahf@jlahtine-mobl.ger.corp.intel.com
| | * | drm/i915/fbc: Consult hw.crtc instead of uapi.crtcVille Syrjälä2022-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | plane_state->uapi.crtc is not what we want to be looking at. If bigjoiner is used hw.crtc is what tells us what crtc the plane is supposedly using. Not an actual problem on current hardware as the only FBC capable pipe (A) can't be a bigjoiner slave and thus uapi.crtc==hw.crtc always here. But when we get more FBC instances this will become actually important. Fixes: 2e6c99f88679 ("drm/i915/fbc: Nuke lots of crap from intel_fbc_state_cache") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220413152852.7336-1-ville.syrjala@linux.intel.com Reviewed-by: Manasi Navare <manasi.d.navare@intel.com> (cherry picked from commit 3e1faae3398789abe8d4797255bfe28d95d81308) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
| | * | drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addressesImre Deak2022-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix typo in the _SEL_FETCH_PLANE_BASE_1_B register base address. Fixes: a5523e2ff074a5 ("drm/i915: Add PSR2 selective fetch registers") References: https://gitlab.freedesktop.org/drm/intel/-/issues/5400 Cc: José Roberto de Souza <jose.souza@intel.com> Cc: <stable@vger.kernel.org> # v5.9+ Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220421162221.2261895-1-imre.deak@intel.com (cherry picked from commit af2cbc6ef967f61711a3c40fca5366ea0bc7fecc) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
| | * | drm/i915: Check EDID for HDR static metadata when choosing blcJouni Högander2022-04-251-8/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have now seen panel (XMG Core 15 e21 laptop) advertizing support for Intel proprietary eDP backlight control via DPCD registers, but actually working only with legacy pwm control. This patch adds panel EDID check for possible HDR static metadata and Intel proprietary eDP backlight control is used only if that exists. Missing HDR static metadata is ignored if user specifically asks for Intel proprietary eDP backlight control via enable_dpcd_backlight parameter. v2 : - Ignore missing HDR static metadata if Intel proprietary eDP backlight control is forced via i915.enable_dpcd_backlight - Printout info message if panel is missing HDR static metadata and support for Intel proprietary eDP backlight control is detected Fixes: 4a8d79901d5b ("drm/i915/dp: Enable Intel's HDR backlight interface (only SDR for now)") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5284 Cc: Lyude Paul <lyude@redhat.com> Cc: Mika Kahola <mika.kahola@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Filippo Falezza <filippo.falezza@outlook.it> Cc: stable@vger.kernel.org Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220413082826.120634-1-jouni.hogander@intel.com Reviewed-by: Lyude Paul <lyude@redhat.com> (cherry picked from commit b4b157577cb1de13bee8bebc3576f1de6799a921) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
| | * | drm/i915: Fix DISP_POS_Y and DISP_HEIGHT definesHans de Goede2022-04-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 428cb15d5b00 ("drm/i915: Clean up pre-skl primary plane registers") introduced DISP_POS_Y and DISP_HEIGHT defines but accidentally set these their masks to REG_GENMASK(31, 0) instead of REG_GENMASK(31, 16). This breaks the primary display pane on at least pineview machines, fix the mask to fix the primary display pane only showing black. Tested on an Acer One AO532h with an Intel N450 SoC. Fixes: 428cb15d5b00 ("drm/i915: Clean up pre-skl primary plane registers") Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220418150936.5499-1-hdegoede@redhat.com Reviewed-by: José Roberto de Souza <jose.souza@intel.com> (cherry picked from commit 681f8a5c6e372dbfd2a313ace417e7749543de1d) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
| * | | Merge tag 'drm-misc-fixes-2022-04-27' of ↵Dave Airlie2022-04-291-3/+0
| |\ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.18-rc5: - Single fix removing applying PHYS_OFFSET twice in sunxi. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/f692bb62-5620-1868-91b7-dffb8d6f9175@linux.intel.com
| | * | drm/sun4i: Remove obsolete references to PHYS_OFFSETSamuel Holland2022-04-261-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b4bdc4fbf8d0 ("soc: sunxi: Deal with the MBUS DMA offsets in a central place") added a platform device notifier that sets the DMA offset for all of the display engine frontend and backend devices. The code applying the offset to DMA buffer physical addresses was then removed from the backend driver in commit 756668ba682e ("drm/sun4i: backend: Remove the MBUS quirks"), but the code subtracting PHYS_OFFSET was left in the frontend driver. As a result, the offset was applied twice in the frontend driver. This likely went unnoticed because it only affects specific configurations (scaling or certain pixel formats) where the frontend is used, on boards with both one of these older SoCs and more than 1 GB of DRAM. In addition, the references to PHYS_OFFSET prevent compiling the driver on architectures where PHYS_OFFSET is not defined. Fixes: b4bdc4fbf8d0 ("soc: sunxi: Deal with the MBUS DMA offsets in a central place") Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20220424162633.12369-4-samuel@sholland.org
* | | | Merge tag 'net-5.18-rc5' of ↵Linus Torvalds2022-04-2870-414/+598
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf and netfilter. Current release - new code bugs: - bridge: switchdev: check br_vlan_group() return value - use this_cpu_inc() to increment net->core_stats, fix preempt-rt Previous releases - regressions: - eth: stmmac: fix write to sgmii_adapter_base Previous releases - always broken: - netfilter: nf_conntrack_tcp: re-init for syn packets only, resolving issues with TCP fastopen - tcp: md5: fix incorrect tcp_header_len for incoming connections - tcp: fix F-RTO may not work correctly when receiving DSACK - tcp: ensure use of most recently sent skb when filling rate samples - tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT - virtio_net: fix wrong buf address calculation when using xdp - xsk: fix forwarding when combining copy mode with busy poll - xsk: fix possible crash when multiple sockets are created - bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook - sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event - wireguard: device: check for metadata_dst with skb_valid_dst() - netfilter: update ip6_route_me_harder to consider L3 domain - gre: make o_seqno start from 0 in native mode - gre: switch o_seqno to atomic to prevent races in collect_md mode Misc: - add Eric Dumazet to networking maintainers - dt: dsa: realtek: remove realtek,rtl8367s string - netfilter: flowtable: Remove the empty file" * tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits) tcp: fix F-RTO may not work correctly when receiving DSACK Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits" net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK ixgbe: ensure IPsec VF<->PF compatibility MAINTAINERS: Update BNXT entry with firmware files netfilter: nft_socket: only do sk lookups when indev is available net: fec: add missing of_node_put() in fec_enet_init_stop_mode() bnx2x: fix napi API usage sequence tls: Skip tls_append_frag on zero copy size Add Eric Dumazet to networking maintainers netfilter: conntrack: fix udp offload timeout sysctl netfilter: nf_conntrack_tcp: re-init for syn packets only net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK net: Use this_cpu_inc() to increment net->core_stats Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted Bluetooth: hci_event: Fix creating hci_conn object on error status Bluetooth: hci_event: Fix checking for invalid handle on error status ice: fix use-after-free when deinitializing mailbox snapshot ice: wait 5 s for EMP reset after firmware flash ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg() ...
| * | | | tcp: fix F-RTO may not work correctly when receiving DSACKPengcheng Yang2022-04-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently DSACK is regarded as a dupack, which may cause F-RTO to incorrectly enter "loss was real" when receiving DSACK. Packetdrill to demonstrate: // Enable F-RTO and TLP 0 `sysctl -q net.ipv4.tcp_frto=2` 0 `sysctl -q net.ipv4.tcp_early_retrans=3` 0 `sysctl -q net.ipv4.tcp_congestion_control=cubic` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // RTT 10ms, RTO 210ms +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.01 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 2 data segments +0 write(4, ..., 2000) = 2000 +0 > P. 1:2001(2000) ack 1 // TLP +.022 > P. 1001:2001(1000) ack 1 // Continue to send 8 data segments +0 write(4, ..., 10000) = 10000 +0 > P. 2001:10001(8000) ack 1 // RTO +.188 > . 1:1001(1000) ack 1 // The original data is acked and new data is sent(F-RTO step 2.b) +0 < . 1:1(0) ack 2001 win 257 +0 > P. 10001:12001(2000) ack 1 // D-SACK caused by TLP is regarded as a dupack, this results in // the incorrect judgment of "loss was real"(F-RTO step 3.a) +.022 < . 1:1(0) ack 2001 win 257 <sack 1001:2001,nop,nop> // Never-retransmitted data(3001:4001) are acked and // expect to switch to open state(F-RTO step 3.b) +0 < . 1:1(0) ack 4001 win 257 +0 %{ assert tcpi_ca_state == 0, tcpi_ca_state }% Fixes: e33099f96d99 ("tcp: implement RFC5682 F-RTO") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1650967419-2150-1-git-send-email-yangpc@wangsu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski2022-04-283-30/+45
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix incorrect TCP connection tracking window reset for non-syn packets, from Florian Westphal. 2) Incorrect dependency on CONFIG_NFT_FLOW_OFFLOAD, from Volodymyr Mytnyk. 3) Fix nft_socket from the output path, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_socket: only do sk lookups when indev is available netfilter: conntrack: fix udp offload timeout sysctl netfilter: nf_conntrack_tcp: re-init for syn packets only ==================== Link: https://lore.kernel.org/r/20220428142109.38726-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>