summaryrefslogtreecommitdiffstats
path: root/arch/s390/pci
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'mm-stable-2023-08-28-18-26' of ↵Linus Torvalds2023-08-291-47/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). * tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits) maple_tree: shrink struct maple_tree maple_tree: clean up mas_wr_append() secretmem: convert page_is_secretmem() to folio_is_secretmem() nios2: fix flush_dcache_page() for usage from irq context hugetlb: add documentation for vma_kernel_pagesize() mm: add orphaned kernel-doc to the rst files. mm: fix clean_record_shared_mapping_range kernel-doc mm: fix get_mctgt_type() kernel-doc mm: fix kernel-doc warning from tlb_flush_rmaps() mm: remove enum page_entry_size mm: allow ->huge_fault() to be called without the mmap_lock held mm: move PMD_ORDER to pgtable.h mm: remove checks for pte_index memcg: remove duplication detection for mem_cgroup_uncharge_swap mm/huge_memory: work on folio->swap instead of page->private when splitting folio mm/swap: inline folio_set_swap_entry() and folio_swap_entry() mm/swap: use dedicated entry for swap in folio mm/swap: stop using page->private on tail pages for THP_SWAP selftests/mm: fix WARNING comparing pointer to 0 selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check ...
| * s390: mm: convert to GENERIC_IOREMAPBaoquan He2023-08-181-47/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(), generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap() and iounmap() are all visible and available to arch. Arch needs to provide wrapper functions to override the generic versions if there's arch specific handling in its ioremap_prot(), ioremap() or iounmap(). This change will simplify implementation by removing duplicated code with generic_ioremap_prot() and generic_iounmap(), and has the equivalent functioality as before. Here, add wrapper functions ioremap_prot() and iounmap() for s390's special operation when ioremap() and iounmap(). And also replace including <asm-generic/io.h> with <asm/io.h> in arch/s390/kernel/perf_cpum_sf.c, otherwise building error will be seen because macro defined in <asm/io.h> can't be seen in perf_cpum_sf.c. Link: https://lkml.kernel.org/r/20230706154520.11257-11-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Rich Felker <dalias@libc.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | s390/pci: use builtin_misc_device macro to simplify the codeLi Zetao2023-08-231-6/+1
|/ | | | | | | | | | | Use the builtin_misc_device macro to simplify the code, which is the same as declaring with device_initcall(). Signed-off-by: Li Zetao <lizetao1@huawei.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230815080833.1103609-1-lizetao1@huawei.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390: fix various typosHeiko Carstens2023-07-031-3/+3
| | | | | | | Fix various typos found with codespell. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
* Merge branch 'uaccess-inline-asm-cleanup' into featuresVasily Gorbik2023-04-043-15/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Heiko Carstens says: =================== There are a couple of oddities within the s390 uaccess library functions. Therefore cleanup the whole uaccess.c file. There is no functional change, only improved readability. The output of "objdump -Dr" was always compared before/after each patch to make sure that the generated object file is identical, if that could be expected. Therefore the series also includes more patches than really required to cleanup the code. Furthermore the kunit usercopy tests also still pass. =================== Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * PCI: s390: Fix use-after-free of PCI resources with per-function hotplugNiklas Schnelle2023-03-133-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On s390 PCI functions may be hotplugged individually even when they belong to a multi-function device. In particular on an SR-IOV device VFs may be removed and later re-added. In commit a50297cf8235 ("s390/pci: separate zbus creation from scanning") it was missed however that struct pci_bus and struct zpci_bus's resource list retained a reference to the PCI functions MMIO resources even though those resources are released and freed on hot-unplug. These stale resources may subsequently be claimed when the PCI function re-appears resulting in use-after-free. One idea of fixing this use-after-free in s390 specific code that was investigated was to simply keep resources around from the moment a PCI function first appeared until the whole virtual PCI bus created for a multi-function device disappears. The problem with this however is that due to the requirement of artificial MMIO addreesses (address cookies) extra logic is then needed to keep the address cookies compatible on re-plug. At the same time the MMIO resources semantically belong to the PCI function so tying their lifecycle to the function seems more logical. Instead a simpler approach is to remove the resources of an individually hot-unplugged PCI function from the PCI bus's resource list while keeping the resources of other PCI functions on the PCI bus untouched. This is done by introducing pci_bus_remove_resource() to remove an individual resource. Similarly the resource also needs to be removed from the struct zpci_bus's resource list. It turns out however, that there is really no need to add the MMIO resources to the struct zpci_bus's resource list at all and instead we can simply use the zpci_bar_struct's resource pointer directly. Fixes: a50297cf8235 ("s390/pci: separate zbus creation from scanning") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230306151014.60913-2-schnelle@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* | s390/pci: clean up left over special treatment for function zeroNiklas Schnelle2023-03-132-25/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to commit 960ac3626487 ("s390/pci: allow zPCI zbus without a function zero") enabling and scanning a PCI function had to potentially be postponed until the function with devfn zero on that bus was plugged. While the commit removed the waiting itself extra code to scan all functions on the PCI bus once function zero appeared was missed. Remove that code and the outdated comments about waiting for function zero. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20230306151014.60913-5-schnelle@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* | s390/pci: remove redundant pci_bus_add_devices() on new busNiklas Schnelle2023-03-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | The pci_bus_add_devices() call in zpci_bus_create_pci_bus() is without function since at this point no device could have been added to the freshly created PCI bus. Suggested-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20230306151014.60913-4-schnelle@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* | s390/pci: only add specific device in zpci_bus_scan_device()Niklas Schnelle2023-03-131-2/+1
|/ | | | | | | | | | | | As the name suggests zpci_bus_scan_device() is used to scan a specific device and thus pci_bus_add_device() for that device is sufficient. Furthermore move this call inside the rescan/remove locking. Suggested-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20230306151014.60913-3-schnelle@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* iommu/s390: Use GFP_KERNEL in sleepable contextsJason Gunthorpe2023-01-251-1/+1
| | | | | | | | | | | These contexts are sleepable, so use the proper annotation. The GFP_ATOMIC was added mechanically in the prior patches. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/10-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* iommu/s390: Push the gfp parameter to the kmem_cache_alloc()'sJason Gunthorpe2023-01-251-14/+17
| | | | | | | | | | | | | dma_alloc_cpu_table() and dma_alloc_page_table() are eventually called by iommufd through s390_iommu_map_pages() and it should not be forced to atomic. Thread the gfp parameter through the call chain starting from s390_iommu_map_pages(). Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* Merge tag 'iommu-updates-v6.2' of ↵Linus Torvalds2022-12-192-36/+54
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core code: - map/unmap_pages() cleanup - SVA and IOPF refactoring - Clean up and document return codes from device/domain attachment AMD driver: - Rework and extend parsing code for ivrs_ioapic, ivrs_hpet and ivrs_acpihid command line options - Some smaller cleanups Intel driver: - Blocking domain support - Cleanups S390 driver: - Fixes and improvements for attach and aperture handling PAMU driver: - Resource leak fix and cleanup Rockchip driver: - Page table permission bit fix Mediatek driver: - Improve safety from invalid dts input - Smaller fixes and improvements Exynos driver: - Fix driver initialization sequence Sun50i driver: - Remove IOMMU_DOMAIN_IDENTITY as it has not been working forever - Various other fixes" * tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (74 commits) iommu/mediatek: Fix forever loop in error handling iommu/mediatek: Fix crash on isr after kexec() iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITY iommu/amd: Fix typo in macro parameter name iommu/mediatek: Remove unused "mapping" member from mtk_iommu_data iommu/mediatek: Improve safety for mediatek,smi property in larb nodes iommu/mediatek: Validate number of phandles associated with "mediatek,larbs" iommu/mediatek: Add error path for loop of mm_dts_parse iommu/mediatek: Use component_match_add iommu/mediatek: Add platform_device_put for recovering the device refcnt iommu/fsl_pamu: Fix resource leak in fsl_pamu_probe() iommu/vt-d: Use real field for indication of first level iommu/vt-d: Remove unnecessary domain_context_mapped() iommu/vt-d: Rename domain_add_dev_info() iommu/vt-d: Rename iommu_disable_dev_iotlb() iommu/vt-d: Add blocking domain support iommu/vt-d: Add device_block_translation() helper iommu/vt-d: Allocate pasid table in device probe path iommu/amd: Check return value of mmu_notifier_register() iommu/amd: Fix pci device refcount leak in ppr_notifier() ...
| * s390/pci: use lock-free I/O translation updatesNiklas Schnelle2022-11-191-29/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I/O translation tables on s390 use 8 byte page table entries and tables which are allocated lazily but only freed when the entire I/O translation table is torn down. Also each IOVA can at any time only translate to one physical address Furthermore I/O table accesses by the IOMMU hardware are cache coherent. With a bit of care we can thus use atomic updates to manipulate the translation table without having to use a global lock at all. This is done analogous to the existing I/O translation table handling code used on Intel and AMD x86 systems. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20221109142903.4080275-6-schnelle@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| * iommu/s390: Use RCU to allow concurrent domain_list iterationNiklas Schnelle2022-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The s390_domain->devices list is only added to when new devices are attached but is iterated through in read-only fashion for every mapping operation as well as for I/O TLB flushes and thus in performance critical code causing contention on the s390_domain->list_lock. Fortunately such a read-mostly linked list is a standard use case for RCU. This change closely follows the example fpr RCU protected list given in Documentation/RCU/listRCU.rst. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20221109142903.4080275-4-schnelle@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| * iommu/s390: Make attach succeed even if the device is in error stateNiklas Schnelle2022-11-192-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a zPCI device is in the error state while switching IOMMU domains zpci_register_ioat() will fail and we would end up with the device not attached to any domain. In this state since zdev->dma_table == NULL a reset via zpci_hot_reset_device() would wrongfully re-initialize the device for DMA API usage using zpci_dma_init_device(). As automatic recovery is currently disabled while attached to an IOMMU domain this only affects slot resets triggered through other means but will affect automatic recovery once we switch to using dma-iommu. Additionally with that switch common code expects attaching to the default domain to always work so zpci_register_ioat() should only fail if there is no chance to recover anyway, e.g. if the device has been unplugged. Improve the robustness of attach by specifically looking at the status returned by zpci_mod_fc() to determine if the device is unavailable and in this case simply ignore the error. Once the device is reset zpci_hot_reset_device() will then correctly set the domain's DMA translation tables. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20221109142903.4080275-2-schnelle@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* | Merge tag 's390-6.2-1' of ↵Linus Torvalds2022-12-121-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Factor out handle_write() function and simplify 3215 console write operation - When 3170 terminal emulator is connected to the 3215 console driver the boot time could be very long due to limited buffer space or missing operator input. Add con3215_drop command line parameter and con3215_drop sysfs attribute file to instruct the kernel drop console data when such conditions are met - Fix white space errors in 3215 console driver - Move enum paiext_mode definition to a header file and rename it to paievt_mode to indicate this is now used for several events. Rename PAI_MODE_COUNTER to PAI_MODE_COUNTING to make consistent with PAI_MODE_SAMPLING - Simplify the logic of PMU pai_crypto mapped buffer reference counter and make it consistent with PMU pai_ext - Rename PMU pai_crypto mapped buffer structure member users to active_events to make it consistent with PMU pai_ext - Enable HUGETLB_PAGE_OPTIMIZE_VMEMMAP configuration option. This results in saving of 12K per 1M hugetlb page (~1.2%) and 32764K per 2G hugetlb page (~1.6%) - Use generic serial.h, bugs.h, shmparam.h and vga.h header files and scrap s390-specific versions - The generic percpu setup code does not expect the s390-like implementation and emits a warning. To get rid of that warning and provide sane CPU-to-node and CPU-to-CPU distance mappings implementat a minimal version of setup_per_cpu_areas() - Use kstrtobool() instead of strtobool() for re-IPL sysfs device attributes - Avoid unnecessary lookup of a pointer to MSI descriptor when setting IRQ affinity for a PCI device - Get rid of "an incompatible function type cast" warning by changing debug_sprintf_format_fn() function prototype so it matches the debug_format_proc_t function type - Remove unused info_blk_hdr__pcpus() and get_page_state() functions - Get rid of clang "unused unused insn cache ops function" warning by moving s390_insn definition to a private header - Get rid of clang "unused function" warning by making function raw3270_state_final() only available if CONFIG_TN3270_CONSOLE is enabled - Use kstrobool() to parse sclp_con_drop parameter to make it identical to the con3215_drop parameter and allow passing values like "yes" and "true" - Use sysfs_emit() for all SCLP sysfs show functions, which is the current standard way to generate output strings - Make SCLP con_drop sysfs attribute also writable and allow to change its value during runtime. This makes SCLP console drop handling consistent with the 3215 device driver - Virtual and physical addresses are indentical on s390. However, there is still a confusion when pointers are directly casted to physical addresses or vice versa. Use correct address converters virt_to_phys() and phys_to_virt() for s390 channel IO drivers - Support for power managemant has been removed from s390 since quite some time. Remove unused power managemant code from the appldata device driver - Allow memory tools like KASAN see memory accesses from the checksum code. Switch to GENERIC_CSUM if KASAN is enabled, just like x86 does - Add support of ECKD DASDs disks so it could be used as boot and dump devices - Follow checkpatch recommendations and use octal values instead of S_IRUGO and S_IWUSR for dump device attributes in sysfs - Changes to vx-insn.h do not cause a recompile of C files that use asm(".include \"asm/vx-insn.h\"\n") magic to access vector instruction macros from inline assemblies. Add wrapper include header file to avoid this problem - Use vector instruction macros instead of byte patterns to increase register validation routine readability - The current machine check register validation handling does not take into account various scenarios and might lead to killing a wrong user process or potentially ignore corrupted FPU registers. Simplify logic of the machine check handler and stop the whole machine if the previous context was kerenel mode. If the previous context was user mode, kill the current task - Introduce sclp_emergency_printk() function which can be used to emit a message in emergency cases. It is supposed to be used in cases where regular console device drivers may not work anymore, e.g. unrecoverable machine checks Keep the early Service-Call Control Block so it can also be used after initdata has been freed to allow sclp_emergency_printk() implementation - In case a system will be stopped because of an unrecoverable machine check error print the machine check interruption code to give a hint of what went wrong - Move storage error checking from the assembly entry code to C in order to simplify machine check handling. Enter the handler with DAT turned on, which simplifies the entry code even more - The machine check extended save areas are allocated using a private "nmi_save_areas" slab cache which guarantees a required power-of-two alignment. Get rid of that cache in favour of kmalloc() * tag 's390-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (38 commits) s390/nmi: get rid of private slab cache s390/nmi: move storage error checking back to C, enter with DAT on s390/nmi: print machine check interruption code before stopping system s390/sclp: introduce sclp_emergency_printk() s390/sclp: keep sclp_early_sccb s390/nmi: rework register validation handling s390/nmi: use vector instruction macros instead of byte patterns s390/vx: add vx-insn.h wrapper include file s390/ipl: use octal values instead of S_* macros s390/ipl: add eckd dump support s390/ipl: add eckd support vfio/ccw: identify CCW data addresses as physical vfio/ccw: sort out physical vs virtual pointers usage s390/checksum: support GENERIC_CSUM, enable it for KASAN s390/appldata: remove power management callbacks s390/cio: sort out physical vs virtual pointers usage s390/sclp: allow to change sclp_console_drop during runtime s390/sclp: convert to use sysfs_emit() s390/sclp: use kstrobool() to parse sclp_con_drop parameter s390/3270: make raw3270_state_final() depend on CONFIG_TN3270_CONSOLE ...
| * s390/pci: Use irq_data_get_msi_desc()Thomas Gleixner2022-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | No point in doing another lookup of irq_data, it's already provided as an argument. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/linux-s390/8735aoui07.ffs@tglx/ [agordeev@linux.ibm.com added Link tag] Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
* | s390/pci: add missing EX_TABLE entries to ↵Heiko Carstens2022-10-261-4/+4
|/ | | | | | | | | | | | | | __pcistg_mio_inuser()/__pcilg_mio_inuser() For some exception types the instruction address points behind the instruction that caused the exception. Take that into account and add the missing exception table entry. Cc: <stable@vger.kernel.org> Fixes: f058599e22d5 ("s390/pci: Fix s390_mmio_read/write with MIO") Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* Merge tag 's390-6.1-1' of ↵Linus Torvalds2022-10-091-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Make use of the IBM z16 processor activity instrumentation facility extension to count neural network processor assist operations: add a new PMU device driver so that perf can make use of this. - Rework memcpy_real() to avoid DAT-off mode. - Rework absolute lowcore access code. - Various small fixes and improvements all over the code. * tag 's390-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: remove unused bus_next field from struct zpci_dev s390/cio: remove unused ccw_device_force_console() declaration s390/pai: Add support for PAI Extension 1 NNPA counters s390/mm: fix no previous prototype warnings in maccess.c s390/mm: uninline copy_oldmem_kernel() function s390/mm,ptdump: add real memory copy page markers s390/mm: rework memcpy_real() to avoid DAT-off mode s390/dump: save IPL CPU registers once DAT is available s390/pci: convert high_memory to physical address s390/smp,ptdump: add absolute lowcore markers s390/smp: rework absolute lowcore access s390/smp: call smp_reinit_ipl_cpu() before scheduler is available s390/ptdump: add missing amode31 markers s390/mm: split lowcore pages with set_memory_4k() s390/mm: remove unused access parameter from do_fault_error() s390/delay: sync comment within __delay() with reality s390: move from strlcpy with unused retval to strscpy
| * s390/pci: convert high_memory to physical addressNiklas Schnelle2022-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use high_memory as a measure for amount of memory available in determining the required minimum size of our IOVA space with the assumption that one rarely maps more than the available memory for DMA. In special cases like mapping significant amounts of memory more than once this can still be tuned with the s390_iommu_apterture kernel parameter. In this use case high_memory is treated as a physical address. As high_memory is a virtual address however this means we need to convert it using virt_to_phys() before use Note that at the moment physical and virtual addresses are identical so this mismatch does not currently cause trouble. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* | KVM: s390: pci: Hook to access KVM lowlevel from VFIOPierre Morel2022-08-292-1/+12
|/ | | | | | | | | | | | | | | | | | | | | | | We have a cross dependency between KVM and VFIO when using s390 vfio_pci_zdev extensions for PCI passthrough To be able to keep both subsystem modular we add a registering hook inside the S390 core code. This fixes a build problem when VFIO is built-in and KVM is built as a module. Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Fixes: 09340b2fca007 ("KVM: s390: pci: add routines to start/stop interpretive execution") Cc: <stable@vger.kernel.org> Acked-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20220819122945.9309-1-pmorel@linux.ibm.com Message-Id: <20220819122945.9309-1-pmorel@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
* Merge tag 'pci-v5.20-changes' of ↵Linus Torvalds2022-08-041-62/+20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Consolidate duplicated 'next function' scanning and extend to allow 'isolated functions' on s390, similar to existing hypervisors (Niklas Schnelle) Resource management: - Implement pci_iobar_pfn() for sparc, which allows us to remove the sparc-specific pci_mmap_page_range() and pci_mmap_resource_range(). This removes the ability to map the entire PCI I/O space using /proc/bus/pci, but we believe that's already been broken since v2.6.28 (Arnd Bergmann) - Move common PCI definitions to asm-generic/pci.h and rework others to be be more specific and more encapsulated in arches that need them (Stafford Horne) Power management: - Convert drivers to new *_PM_OPS macros to avoid need for '#ifdef CONFIG_PM_SLEEP' or '__maybe_unused' (Bjorn Helgaas) Virtualization: - Add ACS quirk for Broadcom BCM5750x multifunction NICs that isolate the functions but don't advertise an ACS capability (Pavan Chebbi) Error handling: - Clear PCI Status register during enumeration in case firmware left errors logged (Kai-Heng Feng) - When we have native control of AER, enable error reporting for all devices that support AER. Previously only a few drivers enabled this (Stefan Roese) - Keep AER error reporting enabled for switches. Previously we enabled this during enumeration but immediately disabled it (Stefan Roese) - Iterate over error counters instead of error strings to avoid printing junk in AER sysfs counters (Mohamed Khalfella) ASPM: - Remove pcie_aspm_pm_state_change() so ASPM config changes, e.g., via sysfs, are not lost across power state changes (Kai-Heng Feng) Endpoint framework: - Don't stop an EPC when unbinding an EPF from it (Shunsuke Mie) Endpoint embedded DMA controller driver: - Simplify and clean up support for the DesignWare embedded DMA (eDMA) controller (Frank Li, Serge Semin) Broadcom STB PCIe controller driver: - Avoid config space accesses when link is down because we can't recover from the CPU aborts these cause (Jim Quinlan) - Look for power regulators described under Root Ports in DT and enable them before scanning the secondary bus (Jim Quinlan) - Disable/enable regulators in suspend/resume (Jim Quinlan) Freescale i.MX6 PCIe controller driver: - Simplify and clean up clock and PHY management (Richard Zhu) - Disable/enable regulators in suspend/resume (Richard Zhu) - Set PCIE_DBI_RO_WR_EN before writing DBI registers (Richard Zhu) - Allow speeds faster than Gen2 (Richard Zhu) - Make link being down a non-fatal error so controller probe doesn't fail if there are no Endpoints connected (Richard Zhu) Loongson PCIe controller driver: - Add ACPI and MCFG support for Loongson LS7A (Huacai Chen) - Avoid config reads to non-existent LS2K/LS7A devices because a hardware defect causes machine hangs (Huacai Chen) - Work around LS7A integrated devices that report incorrect Interrupt Pin values (Jianmin Lv) Marvell Aardvark PCIe controller driver: - Add support for AER and Slot capability on emulated bridge (Pali Rohár) MediaTek PCIe controller driver: - Add Airoha EN7532 to DT binding (John Crispin) - Allow building of driver for ARCH_AIROHA (Felix Fietkau) MediaTek PCIe Gen3 controller driver: - Print decoded LTSSM state when the link doesn't come up (Jianjun Wang) NVIDIA Tegra194 PCIe controller driver: - Convert DT binding to json-schema (Vidya Sagar) - Add DT bindings and driver support for Tegra234 Root Port and Endpoint mode (Vidya Sagar) - Fix some Root Port interrupt handling issues (Vidya Sagar) - Set default Max Payload Size to 256 bytes (Vidya Sagar) - Fix Data Link Feature capability programming (Vidya Sagar) - Extend Endpoint mode support to devices beyond Controller-5 (Vidya Sagar) Qualcomm PCIe controller driver: - Rework clock, reset, PHY power-on ordering to avoid hangs and improve consistency (Robert Marko, Christian Marangi) - Move pipe_clk handling to PHY drivers (Dmitry Baryshkov) - Add IPQ60xx support (Selvam Sathappan Periakaruppan) - Allow ASPM L1 and substates for 2.7.0 (Krishna chaitanya chundru) - Add support for more than 32 MSI interrupts (Dmitry Baryshkov) Renesas R-Car PCIe controller driver: - Convert DT binding to json-schema (Herve Codina) - Add Renesas RZ/N1D (R9A06G032) to rcar-gen2 DT binding and driver (Herve Codina) Samsung Exynos PCIe controller driver: - Fix phy-exynos-pcie driver so it follows the 'phy_init() before phy_power_on()' PHY programming model (Marek Szyprowski) Synopsys DesignWare PCIe controller driver: - Simplify and clean up the DWC core extensively (Serge Semin) - Fix an issue with programming the ATU for regions that cross a 4GB boundary (Serge Semin) - Enable the CDM check if 'snps,enable-cdm-check' exists; previously we skipped it if 'num-lanes' was absent (Serge Semin) - Allocate a 32-bit DMA-able page to be MSI target instead of using a driver data structure that may not be addressable with 32-bit address (Will McVicker) - Add DWC core support for more than 32 MSI interrupts (Dmitry Baryshkov) Xilinx Versal CPM PCIe controller driver: - Add DT binding and driver support for Versal CPM5 Gen5 Root Port (Bharat Kumar Gogada)" * tag 'pci-v5.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (150 commits) PCI: imx6: Support more than Gen2 speed link mode PCI: imx6: Set PCIE_DBI_RO_WR_EN before writing DBI registers PCI: imx6: Reformat suspend callback to keep symmetric with resume PCI: imx6: Move the imx6_pcie_ltssm_disable() earlier PCI: imx6: Disable clocks in reverse order of enable PCI: imx6: Do not hide PHY driver callbacks and refine the error handling PCI: imx6: Reduce resume time by only starting link if it was up before suspend PCI: imx6: Mark the link down as non-fatal error PCI: imx6: Move regulator enable out of imx6_pcie_deassert_core_reset() PCI: imx6: Turn off regulator when system is in suspend mode PCI: imx6: Call host init function directly in resume PCI: imx6: Disable i.MX6QDL clock when disabling ref clocks PCI: imx6: Propagate .host_init() errors to caller PCI: imx6: Collect clock enables in imx6_pcie_clk_enable() PCI: imx6: Factor out ref clock disable to match enable PCI: imx6: Move imx6_pcie_clk_disable() earlier PCI: imx6: Move imx6_pcie_enable_ref_clk() earlier PCI: imx6: Move PHY management functions together PCI: imx6: Move imx6_pcie_grp_offset(), imx6_pcie_configure_type() earlier PCI: imx6: Convert to NOIRQ_SYSTEM_SLEEP_PM_OPS() ...
| * s390/pci: allow zPCI zbus without a function zeroNiklas Schnelle2022-07-221-62/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the zPCI code blocks PCI bus creation and probing of a zPCI zbus unless there is a PCI function with devfn 0. This is always the case for the PCI functions with hidden RID, but may keep PCI functions from a multi-function PCI device with RID information invisible until the function 0 becomes visible. Worse, as a PCI bus is necessary to even present a PCI hotplug slot, even that remains invisible. With the probing of these so-called isolated PCI functions enabled for s390 in common code, this restriction is no longer necessary. On network cards with multiple ports and a PF per port this also allows using each port on its own while still providing the physical PCI topology information in the devfn needed to associate VFs with their parent PF. Link: https://lore.kernel.org/r/20220628143100.3228092-6-schnelle@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
* | KVM: s390: pci: add routines to start/stop interpretive executionMatthew Rosato2022-07-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | These routines will be invoked at the time an s390x vfio-pci device is associated with a KVM (or when the association is removed), allowing the zPCI device to enable or disable load/store intepretation mode; this requires the host zPCI device to inform firmware of the unique token (GISA designation) that is associated with the owning KVM. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Pierre Morel <pmorel@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-17-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* | KVM: s390: pci: provide routines for enabling/disabling interrupt forwardingMatthew Rosato2022-07-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | These routines will be wired into a kvm ioctl in order to respond to requests to enable / disable a device for Adapter Event Notifications / Adapter Interuption Forwarding. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-16-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* | KVM: s390: pci: do initial setup for AEN interpretationMatthew Rosato2022-07-111-0/+6
| | | | | | | | | | | | | | | | | | | | | | Initial setup for Adapter Event Notification Interpretation for zPCI passthrough devices. Specifically, allocate a structure for forwarding of adapter events and pass the address of this structure to firmware. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-13-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* | s390/pci: stash dtsm and maxstblMatthew Rosato2022-07-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Store information about what IOAT designation types are supported by underlying hardware as well as the largest store block size allowed. These values will be needed by passthrough. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-10-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* | s390/pci: stash associated GISA designationMatthew Rosato2022-07-113-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For passthrough devices, we will need to know the GISA designation of the guest if interpretation facilities are to be used. Setup to stash this in the zdev and set a default of 0 (no GISA designation) for now; a subsequent patch will set a valid GISA designation for passthrough devices. Also, extend mpcific routines to specify this stashed designation as part of the mpcific command. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-9-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* | s390/pci: externalize the SIC operation controls and routineMatthew Rosato2022-07-112-15/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A subsequent patch will be issuing SIC from KVM -- export the necessary routine and make the operation control definitions available from a header. Because the routine will now be exported, let's rename __zpci_set_irq_ctrl to zpci_set_irq_ctrl and get rid of the zero'd iib wrapper function of the same name. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-8-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* | s390/airq: allow for airq structure that uses an input vectorMatthew Rosato2022-07-111-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing device passthrough where interrupts are being forwarded from host to guest, we wish to use a pinned section of guest memory as the vector (the same memory used by the guest as the vector). To accomplish this, add a new parameter for airq_iv_create which allows passing an existing vector to be used instead of allocating a new one. The caller is responsible for ensuring the vector is pinned in memory as well as for unpinning the memory when the vector is no longer needed. A subsequent patch will use this new parameter for zPCI interpretation. Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-7-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* | s390/airq: pass more TPI info to airq handlersMatthew Rosato2022-07-111-2/+7
|/ | | | | | | | | | | | | | | | | A subsequent patch will introduce an airq handler that requires additional TPI information beyond directed vs floating, so pass the entire tpi_info structure via the handler. Only pci actually uses this information today, for the other airq handlers this is effectively a no-op. Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-6-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* s390/pci: add error record for CC 2 retriesNiklas Schnelle2022-04-251-17/+57
| | | | | | | | | | | | | | | | | | | Currently it is not detectable from within Linux when PCI instructions are retried because of a busy condition. Detecting such conditions and especially how long they lasted can however be quite useful in problem determination. This patch enables this by adding an s390dbf error log when a CC 2 is first encountered as well as after the retried instruction. Despite being unlikely it may be possible that these added debug messages drown out important other messages so allow setting the debug level in zpci_err_insn*() and set their level to 1 so they can be filtered out if need be. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/pci: add PCI access type and length to error recordsNiklas Schnelle2022-04-251-15/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when a PCI instruction returns a non-zero condition code it can be very hard to tell from the s390dbf logs what kind of instruction was executed. In case of PCI memory I/O (MIO) instructions it is even impossible to tell if we attempted a load, store or block store or how large the access was because only the address is logged. Improve this by adding an indicator byte for the instruction type to the error record and also store the length of the access for MIO instructions where this can not be deduced from the request. We use the following indicator values: - 'l': PCI load - 's': PCI store - 'b': PCI store block - 'L': PCI load (MIO) - 'S': PCI store (MIO) - 'B': PCI store block (MIO) - 'M': MPCIFC - 'R': RPCIT Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/pci: don't log availability events as errorsNiklas Schnelle2022-04-251-3/+0
| | | | | | | | | | | | | | | | | | | | | Availability events are logged in s390dbf in s390dbf/pci_error/hex_ascii even though they don't indicate an error condition. They have also become redundant as commit 6526a597a2e85 ("s390/pci: add simpler s390dbf traces for events") added an s390dbf/pci_msg/sprintf log entry for availability events which contains all non reserved fields of struct zpci_ccdf_avail. On the other hand the availability entries in the error log make it easy to miss actual errors and may even overwrite error entries if the message buffer wraps. Thus simply remove the availability events from the error log thereby establishing the rule that any content in s390dbf/pci_error indicates some kind of error. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/pci: make better use of zpci_dbg() levelsNiklas Schnelle2022-04-253-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | While the zpci_dbg() macro offers a level parameter this is currently largely unused. The only instance with higher importance than 3 is the UID checking change debug message which is not actually more important as the UID uniqueness guarantee is already exposed in sysfs so this should rather be 3 as well. On the other hand the "add ..." message which shows what devices are visible at the lowest level is essential during problem determination. By setting its level to 1, lowering the debug level can act as a filter to only show the available functions. On the error side the default level is set to 6 while all existing messages are printed at level 0. This is inconsistent and means there is no room for having messages be invisible on the default level so instead set the default level to 3 like for errors matching the default for debug messages. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/pci: rename get_zdev_by_bus() to zdev_from_bus()Niklas Schnelle2022-03-272-4/+4
| | | | | | | | | | | | | | | | | Getting a zpci_dev via get_zdev_by_bus() uses the long lived reference held in zbus->function[devfn]. This is accounted for in pcibios_add_device() and pcibios_release_device(). Therefore there is no need to increment the reference count in get_zdev_by_bus() as is done for get_zdev_by_fid(). Instead callers must not access the device after pcibios_release_device() was called which is necessary for common PCI code anyway. With this though the very similar naming may be misleading so rename get_zdev_by_bus() to zdev_from_bus() emphasizing that we are directly referencing the zdev via the bus. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* s390/pci: improve zpci_dev reference countingNiklas Schnelle2022-03-274-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently zpci_dev uses kref based reference counting but only accounts for one original reference plus one reference from an added pci_dev to its underlying zpci_dev. Counting just the original reference worked until the pci_dev reference was added in commit 2a671f77ee49 ("s390/pci: fix use after free of zpci_dev") because once a zpci_dev goes away, i.e. enters the reserved state, it would immediately get released. However with the pci_dev reference this is no longer the case and the zpci_dev may still appear in multiple availability events indicating that it was reserved. This was solved by detecting when the zpci_dev is already on its way out but still hanging around. This has however shown some light on how unusual our zpci_dev reference counting is. Improve upon this by modelling zpci_dev reference counting on pci_dev. Analogous to pci_get_slot() increment the reference count in get_zdev_by_fid(). Thus all users of get_zdev_by_fid() must drop the reference once they are done with the zpci_dev. Similar to pci_scan_single_device(), zpci_create_device() returns the device with an initial count of 1 and the device added to the zpci_list (analogous to the PCI bus' device_list). In turn users of zpci_create_device() must only drop the reference once the device is gone from the point of view of the zPCI subsystem, it might still be referenced by the common PCI subsystem though. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* s390/pci: make zpci_set_irq()/zpci_clear_irq() staticNiklas Schnelle2022-03-101-2/+2
| | | | | | | | | | | | | Commit c1e18c17bda68 ("s390/pci: add zpci_set_irq()/zpci_clear_irq()") made zpci_set_irq()/zpci_clear_irq() non-static in preparation for using them in zpci_hot_reset_device(). The version of zpci_hot_reset_device() that was finally merged however exploits the fact that IRQs and DMA is implicitly disabled by clp_disable_fh() so the call to zpci_clear_irq() was never added. There are no other calls outside pci_irq.c so lets make both functions static. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* s390/extable: move EX_TABLE define to asm-extable.hHeiko Carstens2022-03-083-0/+3
| | | | | | | | | | | | | | Follow arm64 and riscv and move the EX_TABLE define to asm-extable.h which is a lot less generic than the current linkage.h. Also make sure that all files which contain EX_TABLE usages actually include the new header file. This should make sure that the files always compile and there won't be any random compile breakage due to other header file dependencies. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* Merge tag 'irq-msi-2022-01-13' of ↵Linus Torvalds2022-01-131-6/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MSI irq updates from Thomas Gleixner: "Rework of the MSI interrupt infrastructure. This is a treewide cleanup and consolidation of MSI interrupt handling in preparation for further changes in this area which are necessary to: - address existing shortcomings in the VFIO area - support the upcoming Interrupt Message Store functionality which decouples the message store from the PCI config/MMIO space" * tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits) genirq/msi: Populate sysfs entry only once PCI/MSI: Unbreak pci_irq_get_affinity() genirq/msi: Convert storage to xarray genirq/msi: Simplify sysfs handling genirq/msi: Add abuse prevention comment to msi header genirq/msi: Mop up old interfaces genirq/msi: Convert to new functions genirq/msi: Make interrupt allocation less convoluted platform-msi: Simplify platform device MSI code platform-msi: Let core code handle MSI descriptors bus: fsl-mc-msi: Simplify MSI descriptor handling soc: ti: ti_sci_inta_msi: Remove ti_sci_inta_msi_domain_free_irqs() soc: ti: ti_sci_inta_msi: Rework MSI descriptor allocation NTB/msi: Convert to msi_on_each_desc() PCI: hv: Rework MSI handling powerpc/mpic_u3msi: Use msi_for_each-desc() powerpc/fsl_msi: Use msi_for_each_desc() powerpc/pasemi/msi: Convert to msi_on_each_dec() powerpc/cell/axon_msi: Convert to msi_on_each_desc() powerpc/4xx/hsta: Rework MSI handling ...
| * s390/pci: Rework MSI descriptor walkThomas Gleixner2021-12-161-4/+2
| | | | | | | | | | | | | | | | | | | | | | Replace the about to vanish iterators and make use of the filtering. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20211206210748.305656158@linutronix.de
| * PCI/MSI: Make arch_restore_msi_irqs() less horrible.Thomas Gleixner2021-12-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Make arch_restore_msi_irqs() return a boolean which indicates whether the core code should restore the MSI message or not. Get rid of the indirection in x86. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI Link: https://lore.kernel.org/r/20211206210224.485668098@linutronix.de
* | s390/pci: simplify __pciwb_mio() inline asmNiklas Schnelle2022-01-071-4/+1
| | | | | | | | | | | | | | | | | | | | The PCI Write Barrier instruction ignores the registers encoded in it. There is thus no need to explicitly set the register to zero or to associate it with a variable at all. In the resulting binary this removes an unnecessary lghi and it makes the code simpler. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* | s390/pci: use physical addresses in DMA tablesNiklas Schnelle2021-12-062-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The entries in the DMA translation tables for our IOMMU must specify physical addresses of either the next level table or the final page to be mapped for DMA. Currently however the code simply passes the virtual addresses of both. On the other hand we still need to walk the tables via their virtual addresses so we need to do a phys_to_virt() when setting the entries and a virt_to_phys() when getting them. Similarly when passing the I/O translation anchor to the hardware we must also specify its physical address. As the DMA and IOMMU APIs we are implementing already use the correct phys_addr_t type for the address to be mapped let's also thread this through instead of treating it as just an unsigned long. Note: this currently doesn't fix a real bug, since virtual addresses are indentical to physical ones. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* | s390/pci: use phys_to_virt() for AIBVs/DIBVsNiklas Schnelle2021-12-061-3/+3
|/ | | | | | | | | | | | | The adapter and directed interrupt bit vectors need to be referenced in the FIB with their physical not their virtual address. So use virt_to_phys() as approrpiate. Note: this currently doesn't fix a real bug, since virtual addresses are indentical to physical ones. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/pci: implement minimal PCI error recoveryNiklas Schnelle2021-11-082-3/+274
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the platform detects an error on a PCI function or a service action has been performed it is put in the error state and an error event notification is provided to the OS. Currently we treat all error event notifications the same and simply set pdev->error_state = pci_channel_io_perm_failure requiring user intervention such as use of the recover attribute to get the device usable again. Despite requiring a manual step this also has the disadvantage that the device is completely torn down and recreated resulting in higher level devices such as a block or network device being recreated. In case of a block device this also means that it may need to be removed and added to a software raid even if that could otherwise survive with a temporary degradation. This is of course not ideal more so since an error notification with PEC 0x3A indicates that the platform already performed error recovery successfully or that the error state was caused by a service action that is now finished. At least in this case we can assume that the error state can be reset and the function made usable again. So as not to have the disadvantage of a full tear down and recreation we need to coordinate this recovery with the driver. Thankfully there is already a well defined recovery flow for this described in Documentation/PCI/pci-error-recovery.rst. The implementation of this is somewhat straight forward and simplified by the fact that our recovery flow is defined per PCI function. As a reset we use the newly introduced zpci_hot_reset_device() which also takes the PCI function out of the error state. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Acked-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* s390/pci: implement reset_slot for hotplug slotNiklas Schnelle2021-11-082-0/+68
| | | | | | | | | | | | | | | | | | This is done by adding a zpci_hot_reset_device() call which does a low level reset of the PCI function without changing its higher level function state. This way it can be used while the zPCI function is bound to a driver and with DMA tables being controlled either through the IOMMU or DMA APIs which is prohibited when using zpci_disable_device() as that drop existing DMA translations. As this reset, unlike a normal FLR, also calls zpci_clear_irq() we need to implement arch_restore_msi_irqs() and make sure we re-enable IRQs for the PCI function if they were previously disabled. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* s390/pci: refresh function handle in iomapNiklas Schnelle2021-11-083-9/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function handle of a PCI function is updated when disabling or enabling it as well as when the function's availability changes or it enters the error state. Until now this only occurred either while there is no struct pci_dev associated with the function yet or the function became unavailable. This meant that leaving a stale function handle in the iomap either didn't happen because there was no iomap yet or it lead to errors on PCI access but so would the correct disabled function handle. In the future a CLP Set PCI Function Disable/Enable cycle during PCI device recovery may be done while the device is bound to a driver. In this case we must update the iomap associated with the now-stale function handle to ensure that the resulting zPCI instruction references an accurate function handle. Since the function handle is accessed by the PCI accessor helpers without locking use READ_ONCE()/WRITE_ONCE() to mark this access and prevent compiler optimizations that would move the load/store. With that infrastructure in place let's also properly update the function handle in the existing cases. This makes sure that in the future debugging of a zPCI function access through the handle will show an up to date handle reducing the chance of confusion. Also it makes sure we have one single place where a zPCI function handle is updated after initialization. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* Merge tag 's390-5.16-1' of ↵Linus Torvalds2021-11-063-2/+35
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for ftrace with direct call and ftrace direct call samples. - Add support for kernel command lines longer than current 896 bytes and make its length configurable. - Add support for BEAR enhancement facility to improve last breaking event instruction tracking. - Add kprobes sanity checks and testcases to prevent kprobe in the mid of an instruction. - Allow concurrent access to /dev/hwc for the CPUMF users. - Various ftrace / jump label improvements. - Convert unwinder tests to KUnit. - Add s390_iommu_aperture kernel parameter to tweak the limits on concurrently usable DMA mappings. - Add ap.useirq AP module option which can be used to disable interrupt use. - Add add_disk() error handling support to block device drivers. - Drop arch specific and use generic implementation of strlcpy and strrchr. - Several __pa/__va usages fixes. - Various cio, crypto, pci, kernel doc and other small fixes and improvements all over the code. [ Merge fixup as per https://lore.kernel.org/all/YXAqZ%2FEszRisunQw@osiris/ ] * tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (63 commits) s390: make command line configurable s390: support command lines longer than 896 bytes s390/kexec_file: move kernel image size check s390/pci: add s390_iommu_aperture kernel parameter s390/spinlock: remove incorrect kernel doc indicator s390/string: use generic strlcpy s390/string: use generic strrchr s390/ap: function rework based on compiler warning s390/cio: make ccw_device_dma_* more robust s390/vfio-ap: s390/crypto: fix all kernel-doc warnings s390/hmcdrv: fix kernel doc comments s390/ap: new module option ap.useirq s390/cpumf: Allow multiple processes to access /dev/hwc s390/bitops: return true/false (not 1/0) from bool functions s390: add support for BEAR enhancement facility s390: introduce nospec_uses_trampoline() s390: rename last_break to pgm_last_break s390/ptrace: add last_break member to pt_regs s390/sclp: sort out physical vs virtual pointers usage s390/setup: convert start and end initrd pointers to virtual ...
| * s390/pci: add s390_iommu_aperture kernel parameterNiklas Schnelle2021-10-261-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some applications map the same memory area for DMA multiple times while also mapping significant amounts of memory. With our current DMA code these applications will run out of DMA addresses after mapping half of the available memory because the number of DMA mappings is constrained by the number of concurrently active DMA addresses we support which in turn is limited by the minimum of hardware constraints and high_memory. Limiting the number of active DMA addresses to high_memory is only a heuristic to save memory used by the iommu_bitmap and DMA page tables however. This was added under the assumption that it rarely makes sense to DMA map more than system memory. To accommodate special applications which insist on double mapping, which works on other platforms, allow specifying a factor of how many times installed memory is available as DMA address space. Use 0 as a special value to apply no constraints beyond what hardware dictates at the expense of significantly more memory use. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>