summaryrefslogtreecommitdiffstats
path: root/arch/s390
Commit message (Collapse)AuthorAgeFilesLines
...
* | | | | | Merge tag 'for-linus-iommufd' of ↵Linus Torvalds2023-02-241-0/+17
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Some polishing and small fixes for iommufd: - Remove IOMMU_CAP_INTR_REMAP, instead rely on the interrupt subsystem - Use GFP_KERNEL_ACCOUNT inside the iommu_domains - Support VFIO_NOIOMMU mode with iommufd - Various typos - A list corruption bug if HWPTs are used for attach" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Do not add the same hwpt to the ioas->hwpt_list twice iommufd: Make sure to zero vfio_iommu_type1_info before copying to user vfio: Support VFIO_NOIOMMU with iommufd iommufd: Add three missing structures in ucmd_buffer selftests: iommu: Fix test_cmd_destroy_access() call in user_copy iommu: Remove IOMMU_CAP_INTR_REMAP irq/s390: Add arch_is_isolated_msi() for s390 iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSI genirq/msi: Rename IRQ_DOMAIN_MSI_REMAP to IRQ_DOMAIN_ISOLATED_MSI genirq/irqdomain: Remove unused irq_domain_check_msi_remap() code iommufd: Convert to msi_device_has_isolated_msi() vfio/type1: Convert to iommu_group_has_isolated_msi() iommu: Add iommu_group_has_isolated_msi() genirq/msi: Add msi_device_has_isolated_msi()
| * \ \ \ \ \ Merge tag 'v6.2' into iommufd.git for-nextJason Gunthorpe2023-02-2112-76/+111
| |\ \ \ \ \ \ | | | |/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolve conflicts from the signature change in iommu_map: - drivers/infiniband/hw/usnic/usnic_uiom.c Switch iommu_map_atomic() to iommu_map(.., GFP_ATOMIC) - drivers/vfio/vfio_iommu_type1.c Following indenting change for GFP_KERNEL Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | | | | | Merge branch 'iommu-memory-accounting' of ↵Jason Gunthorpe2023-01-302-16/+20
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu intoiommufd/for-next Jason Gunthorpe says: ==================== iommufd follows the same design as KVM and uses memory cgroups to limit the amount of kernel memory a iommufd file descriptor can pin down. The various internal data structures already use GFP_KERNEL_ACCOUNT to charge its own memory. However, one of the biggest consumers of kernel memory is the IOPTEs stored under the iommu_domain and these allocations are not tracked. This series is the first step in fixing it. The iommu driver contract already includes a 'gfp' argument to the map_pages op, allowing iommufd to specify GFP_KERNEL_ACCOUNT and then having the driver allocate the IOPTE tables with that flag will capture a significant amount of the allocations. Update the iommu_map() API to pass in the GFP argument, and fix all call sites. Replace iommu_map_atomic(). Audit the "enterprise" iommu drivers to make sure they do the right thing. Intel and S390 ignore the GFP argument and always use GFP_ATOMIC. This is problematic for iommufd anyhow, so fix it. AMD and ARM SMMUv2/3 are already correct. A follow up series will be needed to capture the allocations made when the iommu_domain itself is allocated, which will complete the job. ==================== * 'iommu-memory-accounting' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/s390: Use GFP_KERNEL in sleepable contexts iommu/s390: Push the gfp parameter to the kmem_cache_alloc()'s iommu/intel: Use GFP_KERNEL in sleepable contexts iommu/intel: Support the gfp argument to the map_pages op iommu/intel: Add a gfp parameter to alloc_pgtable_page() iommufd: Use GFP_KERNEL_ACCOUNT for iommu_map() iommu/dma: Use the gfp parameter in __iommu_dma_alloc_noncontiguous() iommu: Add a gfp parameter to iommu_map_sg() iommu: Remove iommu_map_atomic() iommu: Add a gfp parameter to iommu_map() Link: https://lore.kernel.org/linux-iommu/0-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | | | | | | irq/s390: Add arch_is_isolated_msi() for s390Jason Gunthorpe2023-01-111-0/+17
| | |_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s390 doesn't use irq_domains, so it has no place to set IRQ_DOMAIN_FLAG_ISOLATED_MSI. Instead of continuing to abuse the iommu subsystem to convey this information add a simple define which s390 can make statically true. The define will cause msi_device_has_isolated() to return true. Remove IOMMU_CAP_INTR_REMAP from the s390 iommu driver. Link: https://lore.kernel.org/r/8-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
* | | | | | | Merge tag 'iommu-updates-v6.3' of ↵Linus Torvalds2023-02-242-16/+20
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Consolidate iommu_map/unmap functions. There have been blocking and atomic variants so far, but that was problematic as this approach does not scale with required new variants which just differ in the GFP flags used. So Jason consolidated this back into single functions that take a GFP parameter. - Retire the detach_dev() call-back in iommu_ops - Arm SMMU updates from Will: - Device-tree binding updates: - Cater for three power domains on SM6375 - Document existing compatible strings for Qualcomm SoCs - Tighten up clocks description for platform-specific compatible strings - Enable Qualcomm workarounds for some additional platforms that need them - Intel VT-d updates from Lu Baolu: - Add Intel IOMMU performance monitoring support - Set No Execute Enable bit in PASID table entry - Two performance optimizations - Fix PASID directory pointer coherency - Fix missed rollbacks in error path - Cleanups - Apple t8110 DART support - Exynos IOMMU: - Implement better fault handling - Error handling fixes - Renesas IPMMU: - Add device tree bindings for r8a779g0 - AMD IOMMU: - Various fixes for handling on SNP-enabled systems and handling of faults with unknown request-ids - Cleanups and other small fixes - Various other smaller fixes and cleanups * tag 'iommu-updates-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (71 commits) iommu/amd: Skip attach device domain is same as new domain iommu: Attach device group to old domain in error path iommu/vt-d: Allow to use flush-queue when first level is default iommu/vt-d: Fix PASID directory pointer coherency iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode iommu/vt-d: Fix error handling in sva enable/disable paths iommu/amd: Improve page fault error reporting iommu/amd: Do not identity map v2 capable device when snp is enabled iommu: Fix error unwind in iommu_group_alloc() iommu/of: mark an unused function as __maybe_unused iommu: dart: DART_T8110_ERROR range should be 0 to 5 iommu/vt-d: Enable IOMMU perfmon support iommu/vt-d: Add IOMMU perfmon overflow handler support iommu/vt-d: Support cpumask for IOMMU perfmon iommu/vt-d: Add IOMMU perfmon support iommu/vt-d: Support Enhanced Command Interface iommu/vt-d: Retrieve IOMMU perfmon capability information iommu/vt-d: Support size of the register set in DRHD iommu/vt-d: Set No Execute Enable bit in PASID table entry iommu/vt-d: Remove sva from intel_svm_dev ...
| | \ \ \ \ \ \
| | \ \ \ \ \ \
| *-. \ \ \ \ \ \ Merge branches 'apple/dart', 'arm/exynos', 'arm/renesas', 'arm/smmu', ↵Joerg Roedel2023-02-182-16/+20
| |\ \ \ \ \ \ \ \ | | | |/ / / / / / | | |/| / / / / / | | | |/ / / / / | | |_| / / / / | |/| | / / / 'x86/vt-d', 'x86/amd' and 'core' into next
| | | * | | | iommu/s390: Use GFP_KERNEL in sleepable contextsJason Gunthorpe2023-01-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These contexts are sleepable, so use the proper annotation. The GFP_ATOMIC was added mechanically in the prior patches. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/10-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * | | | iommu/s390: Push the gfp parameter to the kmem_cache_alloc()'sJason Gunthorpe2023-01-252-16/+20
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dma_alloc_cpu_table() and dma_alloc_page_table() are eventually called by iommufd through s390_iommu_map_pages() and it should not be forced to atomic. Thread the gfp parameter through the call chain starting from s390_iommu_map_pages(). Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* | | | | | Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds2023-02-234-11/+10
|\ \ \ \ \ \ | |_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
| * | | | | mm: replace vma->vm_flags indirect modification in ksm_madviseSuren Baghdasaryan2023-02-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace indirect modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. Link: https://lkml.kernel.org/r/20230126193752.297968-6-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | | mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan2023-02-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | | mm: add vma_alloc_zeroed_movable_folio()Matthew Wilcox (Oracle)2023-02-021-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace alloc_zeroed_user_highpage_movable(). The main difference is returning a folio containing a single page instead of returning the page, but take the opportunity to rename the function to match other allocation functions a little better and rewrite the documentation to place more emphasis on the zeroing rather than the highmem aspect. Link: https://lkml.kernel.org/r/20230116191813.2145215-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | | mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand2023-02-021-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __HAVE_ARCH_PTE_SWP_EXCLUSIVE is now supported by all architectures that support swp PTEs, so let's drop it. Link: https://lkml.kernel.org/r/20230113171026.582290-27-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
| * | | | | mm: remove zap_page_range and create zap_vma_pagesMike Kravetz2023-01-182-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zap_page_range was originally designed to unmap pages within an address range that could span multiple vmas. While working on [1], it was discovered that all callers of zap_page_range pass a range entirely within a single vma. In addition, the mmu notification call within zap_page range does not correctly handle ranges that span multiple vmas. When crossing a vma boundary, a new mmu_notifier_range_init/end call pair with the new vma should be made. Instead of fixing zap_page_range, do the following: - Create a new routine zap_vma_pages() that will remove all pages within the passed vma. Most users of zap_page_range pass the entire vma and can use this new routine. - For callers of zap_page_range not passing the entire vma, instead call zap_page_range_single(). - Remove zap_page_range. [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/ Link: https://lkml.kernel.org/r/20230104002732.232573-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Suggested-by: Peter Xu <peterx@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | | | | | Merge tag 'net-next-6.3' of ↵Linus Torvalds2023-02-211-37/+678
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
| * \ \ \ \ \ Daniel Borkmann says:Jakub Kicinski2023-02-101-37/+678
| |\ \ \ \ \ \ | | |_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ==================== pull-request: bpf-next 2023-02-11 We've added 96 non-merge commits during the last 14 day(s) which contain a total of 152 files changed, 4884 insertions(+), 962 deletions(-). There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c between commit 5b246e533d01 ("ice: split probe into smaller functions") from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev() is otherwise there a 2nd time, and add XDP features to the existing ice_cfg_netdev() one: [...] ice_set_netdev_features(netdev); netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | NETDEV_XDP_ACT_XSK_ZEROCOPY; ice_set_ops(netdev); [...] Stephen's merge conflict mail: https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/ The main changes are: 1) Add support for BPF trampoline on s390x which finally allows to remove many test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich. 2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski. 3) Add capability to export the XDP features supported by the NIC. Along with that, add a XDP compliance test tool, from Lorenzo Bianconi & Marek Majtyka. 4) Add __bpf_kfunc tag for marking kernel functions as kfuncs, from David Vernet. 5) Add a deep dive documentation about the verifier's register liveness tracking algorithm, from Eduard Zingerman. 6) Fix and follow-up cleanups for resolve_btfids to be compiled as a host program to avoid cross compile issues, from Jiri Olsa & Ian Rogers. 7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted when testing on different NICs, from Jesper Dangaard Brouer. 8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang. 9) Extend libbpf to add an option for when the perf buffer should wake up, from Jon Doron. 10) Follow-up fix on xdp_metadata selftest to just consume on TX completion, from Stanislav Fomichev. 11) Extend the kfuncs.rst document with description on kfunc lifecycle & stability expectations, from David Vernet. 12) Fix bpftool prog profile to skip attaching to offline CPUs, from Tonghao Zhang. ==================== Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | | s390/bpf: Implement bpf_jit_supports_kfunc_call()Ilya Leoshkevich2023-01-291-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement calling kernel functions from eBPF. In general, the eBPF ABI is fairly close to that of s390x, with one important difference: on s390x callers should sign-extend signed arguments. Handle that by using information returned by bpf_jit_find_kfunc_model(). Here is an example of how sign extensions works. Suppose we need to call the following function from BPF: ; long noinline bpf_kfunc_call_test4(signed char a, short b, int c, long d) 0000000000936a78 <bpf_kfunc_call_test4>: 936a78: c0 04 00 00 00 00 jgnop bpf_kfunc_call_test4 ; return (long)a + (long)b + (long)c + d; 936a7e: b9 08 00 45 agr %r4,%r5 936a82: b9 08 00 43 agr %r4,%r3 936a86: b9 08 00 24 agr %r2,%r4 936a8a: c0 f4 00 1e 3b 27 jg <__s390_indirect_jump_r14> As per the s390x ABI, bpf_kfunc_call_test4() has the right to assume that a, b and c are sign-extended by the caller, which results in using 64-bit additions (agr) without any additional conversions. Without sign extension we would have the following on the JITed code side: ; tmp = bpf_kfunc_call_test4(-3, -30, -200, -1000); ; 5: b4 10 00 00 ff ff ff fd w1 = -3 0x3ff7fdcdad4: llilf %r2,0xfffffffd ; 6: b4 20 00 00 ff ff ff e2 w2 = -30 0x3ff7fdcdada: llilf %r3,0xffffffe2 ; 7: b4 30 00 00 ff ff ff 38 w3 = -200 0x3ff7fdcdae0: llilf %r4,0xffffff38 ; 8: b7 40 00 00 ff ff fc 18 r4 = -1000 0x3ff7fdcdae6: lgfi %r5,-1000 0x3ff7fdcdaec: mvc 64(4,%r15),160(%r15) 0x3ff7fdcdaf2: lgrl %r1,bpf_kfunc_call_test4@GOT 0x3ff7fdcdaf8: brasl %r14,__s390_indirect_jump_r1 This first 3 llilfs are 32-bit loads, that need to be sign-extended to 64 bits. Note: at the moment bpf_jit_find_kfunc_model() does not seem to play nicely with XDP metadata functions: add_kfunc_call() adds an "abstract" bpf_*() version to kfunc_btf_tab, but then fixup_kfunc_call() puts the concrete version into insn->imm, which bpf_jit_find_kfunc_model() cannot find. But this seems to be a common code problem. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-7-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | | s390/bpf: Implement bpf_jit_supports_subprog_tailcalls()Ilya Leoshkevich2023-01-291-10/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow mixing subprogs and tail calls by passing the current tail call count to subprogs. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-6-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | | s390/bpf: Implement arch_prepare_bpf_trampoline()Ilya Leoshkevich2023-01-291-22/+520
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arch_prepare_bpf_trampoline() is used for direct attachment of eBPF programs to various places, bypassing kprobes. It's responsible for calling a number of eBPF programs before, instead and/or after whatever they are attached to. Add a s390x implementation, paying attention to the following: - Reuse the existing JIT infrastructure, where possible. - Like the existing JIT, prefer making multiple passes instead of backpatching. Currently 2 passes is enough. If literal pool is introduced, this needs to be raised to 3. However, at the moment adding literal pool only makes the code larger. If branch shortening is introduced, the number of passes needs to be increased even further. - Support both regular and ftrace calling conventions, depending on the trampoline flags. - Use expolines for indirect calls. - Handle the mismatch between the eBPF and the s390x ABIs. - Sign-extend fmod_ret return values. invoke_bpf_prog() produces about 120 bytes; it might be possible to slightly optimize this, but reaching 50 bytes, like on x86_64, looks unrealistic: just loading cookie, __bpf_prog_enter, bpf_func, insnsi and __bpf_prog_exit as literals already takes at least 5 * 12 = 60 bytes, and we can't use relative addressing for most of them. Therefore, lower BPF_MAX_TRAMP_LINKS on s390x. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-5-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | | s390/bpf: Implement bpf_arch_text_poke()Ilya Leoshkevich2023-01-291-0/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bpf_arch_text_poke() is used to hotpatch eBPF programs and trampolines. s390x has a very strict hotpatching restriction: the only thing that is allowed to be hotpatched is conditional branch mask. Take the same approach as commit de5012b41e5c ("s390/ftrace: implement hotpatching"): create a conditional jump to a "plt", which loads the target address from memory and jumps to it; then first patch this address, and then the mask. Trampolines (introduced in the next patch) respect the ftrace calling convention: the return address is in %r0, and %r1 is clobbered. With that in mind, bpf_arch_text_poke() does not differentiate between jumps and calls. However, there is a simple optimization for jumps (for the epilogue_ip case): if a jump already points to the destination, then there is no "plt" and we can just flip the mask. For simplicity, the "plt" template is defined in assembly, and its size is used to define C arrays. There doesn't seem to be a way to convey this size to C as a constant, so it's hardcoded and double-checked during runtime. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-4-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | | s390/bpf: Add expoline to tail callsIlya Leoshkevich2023-01-291-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All the indirect jumps in the eBPF JIT already use expolines, except for the tail call one. Fixes: de5cb6eb514e ("s390: use expoline thunks in the BPF JIT") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| | * | | | | s390/bpf: Fix a typo in a commentIlya Leoshkevich2023-01-281-1/+1
| | | |_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "desription" should be "description". Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-27-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* | | | | | Merge tag 'v6.3-p1' of ↵Linus Torvalds2023-02-212-5/+1
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Use kmap_local instead of kmap_atomic - Change request callback to take void pointer - Print FIPS status in /proc/crypto (when enabled) Algorithms: - Add rfc4106/gcm support on arm64 - Add ARIA AVX2/512 support on x86 Drivers: - Add TRNG driver for StarFive SoC - Delete ux500/hash driver (subsumed by stm32/hash) - Add zlib support in qat - Add RSA support in aspeed" * tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (156 commits) crypto: x86/aria-avx - Do not use avx2 instructions crypto: aspeed - Fix modular aspeed-acry crypto: hisilicon/qm - fix coding style issues crypto: hisilicon/qm - update comments to match function crypto: hisilicon/qm - change function names crypto: hisilicon/qm - use min() instead of min_t() crypto: hisilicon/qm - remove some unused defines crypto: proc - Print fips status crypto: crypto4xx - Call dma_unmap_page when done crypto: octeontx2 - Fix objects shared between several modules crypto: nx - Fix sparse warnings crypto: ecc - Silence sparse warning tls: Pass rec instead of aead_req into tls_encrypt_done crypto: api - Remove completion function scaffolding tls: Remove completion function scaffolding tipc: Remove completion function scaffolding net: ipv6: Remove completion function scaffolding net: ipv4: Remove completion function scaffolding net: macsec: Remove completion function scaffolding dm: Remove completion function scaffolding ...
| * | | | | | crypto: s390/aes - drop redundant xts key checkVladis Dronov2023-01-061-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xts_fallback_setkey() in xts_aes_set_key() will now enforce key size rule in FIPS mode when setting up the fallback algorithm keys, which makes the check in xts_aes_set_key() redundant or unreachable. So just drop this check. xts_fallback_setkey() now makes a key size check in xts_verify_key(): xts_fallback_setkey() crypto_skcipher_setkey() [ skcipher_setkey_unaligned() ] cipher->setkey() { .setkey = xts_setkey } xts_setkey() xts_verify_key() Signed-off-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | | | | | crypto: xts - drop xts_check_key()Vladis Dronov2023-01-061-1/+1
| | |_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xts_check_key() is obsoleted by xts_verify_key(). Over time XTS crypto drivers adopted the newer xts_verify_key() variant, but xts_check_key() is still used by a number of drivers. Switch drivers to use the newer xts_verify_key() and make a couple of cleanups. This allows us to drop xts_check_key() completely and avoid redundancy. Signed-off-by: Vladis Dronov <vdronov@redhat.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Nicolai Stange <nstange@suse.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | | | | Merge tag 's390-6.3-1' of ↵Linus Torvalds2023-02-2177-1367/+1935
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Large cleanup of the con3270/tty3270 driver. Among others this fixes: - Background Color Support - ASCII Line Character Support - VT100 Support - Geometries other than 80x24 - Cleanup and improve cmpxchg() code. Also add cmpxchg_user_key() to uaccess functions, which will be used by KVM to access KVM guest memory with a specific storage key - Add support for user space events counting to CPUMF - Cleanup the vfio/ccw code, which also allows now to properly support 2K Format-2 IDALs - Move kernel page table allocation and initialization to decompressor, which finally allows to enter the kernel with dynamic address translation enabled. This in turn allows to get rid of code with special handling in the kernel, which has to distinguish if DAT is on or off - Replace kretprobe with rethook - Various improvements to vfio/ap queue resets: - Use TAPQ to verify completion of a reset in progress rather than multiple invocations of ZAPQ. - Check TAPQ response codes when verifying successful completion of ZAPQ. - Fix erroneous handling of some error response codes. - Increase the maximum amount of time to wait for successful completion of ZAPQ - Rework system call wrappers to get rid of alias functions, which were only left on s390 - Cleanup diag288_wdt watchdog driver. It has been agreed on with Guenter Roeck that this goes upstream via the s390 tree - Add missing loadparm parameter handling for list-directed ECKD ipl/reipl - Various improvements to memory detection code - Remove arch_cpu_idle_time() since the current implementation is broken, and allows user space observable accounted idle times which can temporarily decrease - Add Reset DAT-Protection support: (only) allow to change PTEs from RO to RW with a new RDP instruction. Unlike the currently used IPTE instruction, this does not necessarily guarantee that TLBs of all CPUs are synchronously flushed; and that remote CPUs can see spurious protection faults. The overall improvement for not requiring an all CPU synchronization, like it is required with IPTE, should be beneficial - Fix KFENCE page fault reporting - Smaller cleanups and improvement all over the place * tag 's390-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (182 commits) s390/irq,idle: simplify idle check s390/processor: add test_and_set_cpu_flag() and test_and_clear_cpu_flag() s390/processor: let cpu helper functions return boolean values s390/kfence: fix page fault reporting s390/zcrypt: introduce ctfm field in struct CPRBX s390: remove confusing comment from uapi types header file vfio/ccw: remove WARN_ON during shutdown s390/entry: remove toolchain dependent micro-optimization s390/mem_detect: do not truncate online memory ranges info s390/vx: remove __uint128_t type from __vector128 struct again s390/mm: add support for RDP (Reset DAT-Protection) s390/mm: define private VM_FAULT_* reasons from top bits Documentation: s390: correct spelling s390/ap: fix status returned by ap_qact() s390/ap: fix status returned by ap_aqic() s390: vfio-ap: tighten the NIB validity check Revert "s390/mem_detect: do not update output parameters on failure" s390/idle: remove arch_cpu_idle_time() and corresponding code s390/vx: use simple assignments to access __vector128 members s390/vx: add 64 and 128 bit members to __vector128 struct ...
| * | | | | | s390/irq,idle: simplify idle checkHeiko Carstens2023-02-153-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the per-cpu CIF_ENABLED_WAIT flag to decide if an interrupt occurred while a cpu was idle, instead of checking two conditions within the old psw. Also move clearing of the CIF_ENABLED_WAIT bit to the early interrupt handler, which in turn makes arch_vcpu_is_preempted() also a bit more precise, since the flag is now cleared before interrupt handlers have been called. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/processor: add test_and_set_cpu_flag() and test_and_clear_cpu_flag()Heiko Carstens2023-02-151-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add test_and_set_cpu_flag() and test_and_clear_cpu_flag() helper functions. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/processor: let cpu helper functions return boolean valuesHeiko Carstens2023-02-151-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let cpu helper functions return boolean values. This also allows to make the code a bit simpler by getting rid of the "!!" construct. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/kfence: fix page fault reportingHeiko Carstens2023-02-151-14/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Baoquan He reported lots of KFENCE reports when /proc/kcore is read, e.g. with crash or even simpler with dd: BUG: KFENCE: invalid read in copy_from_kernel_nofault+0x5e/0x120 Invalid read at 0x00000000f4f5149f: copy_from_kernel_nofault+0x5e/0x120 read_kcore+0x6b2/0x870 proc_reg_read+0x9a/0xf0 vfs_read+0x94/0x270 ksys_read+0x70/0x100 __do_syscall+0x1d0/0x200 system_call+0x82/0xb0 The reason for this is that read_kcore() simply reads memory that might have been unmapped by KFENCE with copy_from_kernel_nofault(). Any fault due to pages being unmapped by KFENCE would be handled gracefully by the fault handler (exception table fixup). However the s390 fault handler first reports the fault, and only afterwards would perform the exception table fixup. Most architectures have this in reversed order, which also avoids the false positive KFENCE reports when an unmapped page is accessed. Therefore change the s390 fault handler so it handles exception table fixups before KFENCE page faults are reported. Reported-by: Baoquan He <bhe@redhat.com> Tested-by: Baoquan He <bhe@redhat.com> Acked-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20230213183858.1473681-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/zcrypt: introduce ctfm field in struct CPRBXHarald Freudenberger2023-02-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the CPRBX struct to expose a new field ctfm for use with hardware command filtering within a CEX8 crypto card in CCA coprocessor mode. The field replaces a reserved byte padding field so that the layout of the struct and the size does not change. The new field is used only by user space applications which may use this to expose the HW filtering facilities in the crypto firmware layers. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390: remove confusing comment from uapi types header fileHeiko Carstens2023-02-141-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment for addr_t doesn't make too much sense. Given that also the formatting is incorrect, just remove it. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/entry: remove toolchain dependent micro-optimizationHeiko Carstens2023-02-141-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of CONFIG_AS_IS_LLVM in entry.S to make the code a bit more readable. This removes a micro-optimization, but given that the llvm IAS limitation will likely stay, just use the version that works with llvm. See commit 4c25f0ff6336 ("s390/entry: workaround llvm's IAS limitations") for further details. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/mem_detect: do not truncate online memory ranges infoVasily Gorbik2023-02-148-28/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit bf64f0517e5d ("s390/mem_detect: handle online memory limit just once") introduced truncation of mem_detect online ranges based on identity mapping size. For kdump case however the full set of online memory ranges has to be feed into memblock_physmem_add so that crashed system memory could be extracted. Instead of truncating introduce a "usable limit" which is respected by mem_detect api. Also add extra online memory ranges iterator which still provides full set of online memory ranges disregarding the "usable limit". Fixes: bf64f0517e5d ("s390/mem_detect: handle online memory limit just once") Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/vx: remove __uint128_t type from __vector128 struct againHeiko Carstens2023-02-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __uint128_t member was only added for future convenience to the __vector128 struct. However this is a uapi header file, 31/32 bit (aka compat layer) is still supported, but doesn't know anything about this type: /usr/include/asm/types.h:27:17: error: unknown type name __uint128_t 27 | __uint128_t v; Therefore remove it again. Fixes: b0b7b43fcc46 ("s390/vx: add 64 and 128 bit members to __vector128 struct") Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/mm: add support for RDP (Reset DAT-Protection)Gerald Schaefer2023-02-144-1/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RDP instruction allows to reset DAT-protection bit in a PTE, with less CPU synchronization overhead than IPTE instruction. In particular, IPTE can cause machine-wide synchronization overhead, and excessive IPTE usage can negatively impact machine performance. RDP can be used instead of IPTE, if the new PTE only differs in SW bits and _PAGE_PROTECT HW bit, for PTE protection changes from RO to RW. SW PTE bit changes are allowed, e.g. for dirty and young tracking, but none of the other HW-defined part of the PTE must change. This is because the architecture forbids such changes to an active and valid PTE, which is why invalidation with IPTE is always used first, before writing a new entry. The RDP optimization helps mainly for fault-driven SW dirty-bit tracking. Writable PTEs are initially always mapped with HW _PAGE_PROTECT bit set, to allow SW dirty-bit accounting on first write protection fault, where the DAT-protection would then be reset. The reset is now done with RDP instead of IPTE, if RDP instruction is available. RDP cannot always guarantee that the DAT-protection reset is propagated to all CPUs immediately. This means that spurious TLB protection faults on other CPUs can now occur. For this, common code provides a flush_tlb_fix_spurious_fault() handler, which will now be used to do a CPU-local TLB flush. However, this will clear the whole TLB of a CPU, and not just the affected entry. For more fine-grained flushing, by simply doing a (local) RDP again, flush_tlb_fix_spurious_fault() would need to also provide the PTE pointer. Note that spurious TLB protection faults cannot really be distinguished from racing pagetable updates, where another thread already installed the correct PTE. In such a case, the local TLB flush would be unnecessary overhead, but overall reduction of CPU synchronization overhead by not using IPTE is still expected to be beneficial. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/mm: define private VM_FAULT_* reasons from top bitsPeter Xu2023-02-141-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current definition already collapse with the generic definition of vm_fault_reason. Move the private definitions to allocate bits from the top of uint so they won't collapse anymore. Signed-off-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20230205231704.909536-4-peterx@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/ap: fix status returned by ap_qact()Halil Pasic2023-02-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 159491f3b509 ("s390/ap: rework assembler functions to use unions for in/out register variables") the function ap_qact() tries to grab the status from the wrong part of the register. Thus we always end up with zeros. Which is wrong, among others, because we detect failures via status.response_code. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reported-by: Harald Freudenberger <freude@linux.ibm.com> Fixes: 159491f3b509 ("s390/ap: rework assembler functions to use unions for in/out register variables") Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/ap: fix status returned by ap_aqic()Halil Pasic2023-02-101-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There function ap_aqic() tries to grab the status from the wrong part of the register. Thus we always end up with zeros. Which is wrong, among others, because we detect failures via status.response_code. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reported-by: Janosch Frank <frankja@linux.ibm.com> Fixes: 159491f3b509 ("s390/ap: rework assembler functions to use unions for in/out register variables") Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | Revert "s390/mem_detect: do not update output parameters on failure"Heiko Carstens2023-02-091-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit cbc29f107e51b1cc7d1e7b0bbe0691a1224205f1. Get rid of the following smatch warnings: arch/s390/include/asm/mem_detect.h:86 get_mem_detect_end() error: uninitialized symbol 'end'. arch/s390/include/asm/mem_detect.h:86 get_mem_detect_end() error: uninitialized symbol 'end'. arch/s390/boot/vmem.c:256 setup_vmem() error: uninitialized symbol 'start'. arch/s390/boot/vmem.c:258 setup_vmem() error: uninitialized symbol 'end'. Note that there is no bug in the code. This is purely to silence smatch. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/idle: remove arch_cpu_idle_time() and corresponding codeHeiko Carstens2023-02-093-74/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arch_cpu_idle_time() returns the idle time of any given cpu if it is in idle, or zero if not. All if this is racy and partially incorrect. Time stamps taken with store clock extended and store clock fast from different cpus are compared, while the architecture states that this is nothing which can be relied on (see Principles of Operation; Chapter 4, "Setting and Inspecting the Clock"). A more fundamental problem is that the timestamp when a cpu is leaving idle is taken early in the assembler part of the interrupt handler, and this value is only transferred many cycles later to the cpu's per-cpu idle data structure. This per cpu data structure is read by arch_cpu_idle() to tell for which period of time a remote cpu is idle: if only an idle_enter value is present, the assumed idle time of the cpu is calculated by taking a local timestamp and returning the difference of the local timestamp and the idle_enter value. This is potentially incorrect, since the remote cpu may have already left idle, but the taken timestamp may not have been transferred to the per-cpu data structure. This in turn means that too much idle time may be reported for a cpu, and a subsequent calculation of system idle time may result in a smaller value. Instead of coming up with even more complex code trying to fix this, just remove this code, and only account idle time of a cpu, after idle state is left. Another minor bug is that it is assumed that timestamps are non-zero, which is not necessarily the case for timestamps taken with store clock fast. This however is just a very minor problem, since this can only happen when the epoch increases. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/vx: use simple assignments to access __vector128 membersHeiko Carstens2023-02-095-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use simple assignments to access __vector128 members instead of hard to read casts. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/vx: add 64 and 128 bit members to __vector128 structHeiko Carstens2023-02-091-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add 64 and 128 bit members to __vector128 struct in order to allow reading of the complete value, or the higher or lower part of vector register contents instead of having to use casts. Add an explicit __aligned(4) statement to avoid that the alignment of the structure changes from 4 to 8. This should make sure that no breakage happens because of this change. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/processor: always inline cpu flag helper functionsHeiko Carstens2023-02-091-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arch_cpu_idle() is marked noinstr and therefore must only call functions which are also not instrumented. Make sure that cpu flag helper functions are always inlined to avoid that the compiler generates an out-of-line function for e.g. the call within arch_cpu_idle(). Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/idle: mark arch_cpu_idle() noinstrHeiko Carstens2023-02-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | linux-next commit ("cpuidle: tracing: Warn about !rcu_is_watching()") adds a new warning which hits on s390's arch_cpu_idle() function: RCU not on for: arch_cpu_idle+0x0/0x28 WARNING: CPU: 2 PID: 0 at include/linux/trace_recursion.h:162 arch_ftrace_ops_list_func+0x24c/0x258 Modules linked in: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.2.0-rc6-next-20230202 #4 Hardware name: IBM 8561 T01 703 (z/VM 7.3.0) Krnl PSW : 0404d00180000000 00000000002b55c0 (arch_ftrace_ops_list_func+0x250/0x258) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 Krnl GPRS: c0000000ffffbfff 0000000080000002 0000000000000026 0000000000000000 0000037ffffe3a28 0000037ffffe3a20 0000000000000000 0000000000000000 0000000000000000 0000000000f4acf6 00000000001044f0 0000037ffffe3cb0 0000000000000000 0000000000000000 00000000002b55bc 0000037ffffe3bb8 Krnl Code: 00000000002b55b0: c02000840051 larl %r2,0000000001335652 00000000002b55b6: c0e5fff512d1 brasl %r14,0000000000157b58 #00000000002b55bc: af000000 mc 0,0 >00000000002b55c0: a7f4ffe7 brc 15,00000000002b558e 00000000002b55c4: 0707 bcr 0,%r7 00000000002b55c6: 0707 bcr 0,%r7 00000000002b55c8: eb6ff0480024 stmg %r6,%r15,72(%r15) 00000000002b55ce: b90400ef lgr %r14,%r15 Call Trace: [<00000000002b55c0>] arch_ftrace_ops_list_func+0x250/0x258 ([<00000000002b55bc>] arch_ftrace_ops_list_func+0x24c/0x258) [<0000000000f5f0fc>] ftrace_common+0x1c/0x20 [<00000000001044f6>] arch_cpu_idle+0x6/0x28 [<0000000000f4acf6>] default_idle_call+0x76/0x128 [<00000000001cc374>] do_idle+0xf4/0x1b0 [<00000000001cc6ce>] cpu_startup_entry+0x36/0x40 [<0000000000119d00>] smp_start_secondary+0x140/0x150 [<0000000000f5d2ae>] restart_int_handler+0x6e/0x90 Mark arch_cpu_idle() noinstr like all other architectures with CONFIG_ARCH_WANTS_NO_INSTR (should) have it to fix this. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | s390/idle: move idle time accounting to account_idle_time_irq()Heiko Carstens2023-02-091-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no reason to do idle time accounting in arch_cpu_idle(). Do idle time accounting in account_idle_time_irq(), where it belongs to. The accounted values don't change between account_idle_time_irq() and arch_cpu_idle(); so the result is the same. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | | | | | Merge branch 'cmpxchg_user_key' into featuresHeiko Carstens2023-02-094-45/+285
| |\ \ \ \ \ \ | | | |_|_|_|/ | | |/| | | | | | | | | | | Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| | * | | | | s390/uaccess: avoid __ashlti3() callHeiko Carstens2023-01-041-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __cmpxchg_user_key() uses 128 bit types which, depending on compiler and config options, may lead to an __ashlti3() library call. Get rid of that by simply casting the 128 bit values to 32 bit values. Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Fixes: 51098f0eb22e ("s390/cmpxchg: make loop condition for 1,2 byte cases precise") Link: https://lore.kernel.org/all/4b96b112d5415d08a81d30657feec2c8c3000f7c.camel@linux.ibm.com/ Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| | * | | | | s390/uaccess: limit number of retries for cmpxchg_user_key()Janis Schoetterl-Glausch2022-11-221-8/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cmpxchg_user_key() for byte and short values is implemented via a one word cmpxchg loop. Give up trying to perform the cmpxchg if it fails too often because of contention on the cache line. This ensures that the thread cannot become stuck in the kernel. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20221117100745.3253896-1-scgl@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| | * | | | | s390/cmpxchg: make loop condition for 1,2 byte cases preciseJanis Schoetterl-Glausch2022-11-212-62/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cmpxchg implementation for 1 and 2 bytes consists of a 4 byte cmpxchg loop. Currently, the decision to retry is imprecise, looping if bits outside the target byte(s) change instead of retrying until the target byte(s) differ from the old value. E.g. if an attempt to exchange (prev_left_0 old_bytes prev_right_0) is made and it fails because the word at the address is (prev_left_1 x prev_right_1) where both x != old_bytes and one of the prev_*_1 values differs from the respective prev_*_0 value, the cmpxchg is retried, even if by a semantic equivalent to a normal cmpxchg, the exchange would fail. Instead exit the loop if x != old_bytes and retry otherwise. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20221116144711.3811011-1-scgl@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>