summaryrefslogtreecommitdiffstats
path: root/drivers/dma/idxd
Commit message (Collapse)AuthorAgeFilesLines
*-. Merge branches 'apple/dart', 'arm/mediatek', 'arm/renesas', 'arm/rockchip', ↵Joerg Roedel2023-08-215-38/+76
|\ \ | | | | | | | | | 'arm/smmu', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next
| | * dmaengine/idxd: Re-enable kernel workqueue under DMA APIJacob Pan2023-08-095-38/+76
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel workqueues were disabled due to flawed use of kernel VA and SVA API. Now that we have the support for attaching PASID to the device's default domain and the ability to reserve global PASIDs from SVA APIs, we can re-enable the kernel work queues and use them under DMA API. We also use non-privileged access for in-kernel DMA to be consistent with the IOMMU settings. Consequently, interrupt for user privilege is enabled for work completion IRQs. Link: https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/ Tested-by: Tony Zhu <tony.zhu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-9-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
* / dmaengine: idxd: Clear PRS disable flag when disabling IDXD deviceFenghua Yu2023-08-071-3/+1
|/ | | | | | | | | | | | | | | | | | | | | Disabling IDXD device doesn't reset Page Request Service (PRS) disable flag to its initial value 0. This may cause user confusion because once PRS is disabled user will see PRS still remains the previous setting (i.e. disabled) via sysfs interface even after the device is disabled. To eliminate user confusion, reset PRS disable flag to ensure that the PRS flag bit reflects correct state after the device is disabled. Additionally, simplify the code by setting wq->flags to 0, which clears all flag bits, including any future additions. Fixes: f2dc327131b5 ("dmaengine: idxd: add per wq PRS disable") Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230712193505.3440752-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* dmaengine: idxd: Fix passing freed memory in idxd_cdev_open()Harshit Mogalapalli2023-05-171-1/+0
| | | | | | | | | | | | | | | | | | | | | Smatch warns: drivers/dma/idxd/cdev.c:327: idxd_cdev_open() warn: 'sva' was already freed. When idxd_wq_set_pasid() fails, the current code unbinds sva and then goes to 'failed_set_pasid' where iommu_sva_unbind_device is called again causing the above warning. [ device_user_pasid_enabled(idxd) is still true when calling failed_set_pasid ] Fix this by removing additional unbind when idxd_wq_set_pasid() fails Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230509060716.2830630-1-harshit.m.mogalapalli@oracle.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* Merge tag 'dmaengine-6.4-rc1' of ↵Linus Torvalds2023-05-039-69/+1127
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "New support: - Apple admac t8112 device support - StarFive JH7110 DMA controller Updates: - Big pile of idxd updates to support IAA 2.0 device capabilities, DSA 2.0 Event Log and completion record faulting features and new DSA operations - at_xdmac supend & resume updates and driver code cleanup - k3-udma supend & resume support - k3-psil thread support for J784s4" * tag 'dmaengine-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (57 commits) dmaengine: idxd: add per wq PRS disable dmaengine: idxd: add pid to exported sysfs attribute for opened file dmaengine: idxd: expose fault counters to sysfs dmaengine: idxd: add a device to represent the file opened dmaengine: idxd: add per file user counters for completion record faults dmaengine: idxd: process batch descriptor completion record faults dmaengine: idxd: add descs_completed field for completion record dmaengine: idxd: process user page faults for completion record dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling dmaengine: idxd: create kmem cache for event log fault items dmaengine: idxd: add per DSA wq workqueue for processing cr faults dmanegine: idxd: add debugfs for event log dump dmaengine: idxd: add interrupt handling for event log dmaengine: idxd: setup event log configuration dmaengine: idxd: add event log size sysfs attribute dmaengine: idxd: make misc interrupt one shot dt-bindings: dma: snps,dw-axi-dmac: constrain the items of resets for JH7110 dma dt-bindings: dma: Drop unneeded quotes dmaengine: at_xdmac: align declaration of ret with the rest of variables dmaengine: at_xdmac: add a warning message regarding for unpaused channels ...
| * dmaengine: idxd: add per wq PRS disableDave Jiang2023-04-124-5/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | Add sysfs knob for per wq Page Request Service disable. This knob disables PRS support for the specific wq. When this bit is set, it also overrides the wq's block on fault enabling. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-17-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: add pid to exported sysfs attribute for opened fileDave Jiang2023-04-121-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Provide the pid of the application for the opened file. This allows the monitor daemon to easily correlate which app opened the file and easily kill the app by pid if that is desired action. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-16-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: expose fault counters to sysfsDave Jiang2023-04-121-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expose cr_faults and cr_fault_failures counters to the user space. This allows a user app to keep track of how many fault the application is causing with the completion record (CR) and also the number of failures of the CR writeback. Having a high number of cr_fault_failures is bad as the app is submitting descriptors with the CR addresses that are bad. User monitoring daemon may want to consider killing the application as it may be malicious and attempting to flood the device event log. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-15-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: add a device to represent the file openedDave Jiang2023-04-122-24/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | Embed a struct device for the user file context in order to export sysfs attributes related with the opened file. Tie the lifetime of the file context to the device. The sysfs entry will be added under the char device. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-14-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: add per file user counters for completion record faultsDave Jiang2023-04-123-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add counters per opened file for the char device in order to keep track how many completion record faults occurred and how many of those faults failed the writeback by the driver after attempt to fault in the page. The counters are managed by xarray that associates the PASID with struct idxd_user_context. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-13-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: process batch descriptor completion record faultsDave Jiang2023-04-124-25/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add event log processing for faulting of user batch descriptor completion record. When encountering an event log entry for a page fault on a completion record, the driver is expected to do the following: 1. If the "first error in batch" bit in event log entry error info is set, discard any previously recorded errors associated with the "batch identifier". 2. Fix the page fault according to the fault address in the event log. If successful, write the completion record to the fault address in user space. 3. If an error is encountered while writing the completion record and it is associated to a descriptor in the batch, the driver associates the error with the batch identifier of the event log entry and tracks it until the event log entry for the corresponding batch desc is encountered. While processing an event log entry for a batch descriptor with error indicating that one or more descs in the batch had event log entries, the driver will do the following before writing the batch completion record: 1. If the status field of the completion record is 0x1, the driver will change it to error code 0x5 (one or more operations in batch completed with status not successful) and changes the result field to 1. 2. If the status is error code 0x6 (page fault on batch descriptor list address), change the result field to 1. 3. If status is any other value, the completion record is not changed. 4. Clear the recorded error in preparation for next batch with same batch identifier. The result field is for user software to determine whether to set the "Batch Error" flag bit in the descriptor for continuation of partial batch descriptor completion. See DSA spec 2.0 for additional information. If no error has been recorded for the batch, the batch completion record is written to user space as is. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-12-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: process user page faults for completion recordDave Jiang2023-04-125-7/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DSA supports page fault handling through PRS. However, the DMA engine that's processing the descriptor is blocked until the PRS response is received. Other workqueues sharing the engine are also blocked. Page fault handing by the driver with PRS disabled can be used to mitigate the stalling. With PRS disabled while ATS remain enabled, DSA handles page faults on a completion record by reporting an event in the event log. In this instance, the descriptor is completed and the event log contains the completion record address and the contents of the completion record. Add support to the event log handling code to fault in the completion record and copy the content of the completion record to user memory. A bitmap is introduced to keep track of discarded event log entries. When the user process initiates ->release() of the char device, it no longer is interested in any remaining event log entries tied to the relevant wq and PASID. The driver will mark the event log entry index in the bitmap. Upon encountering the entries during processing, the event log handler will just clear the bitmap bit and skip the entry rather than attempt to process the event log entry. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-10-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: add idxd_copy_cr() to copy user completion record during ↵Fenghua Yu2023-04-124-5/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | page fault handling Define idxd_copy_cr() to copy completion record to fault address in user address that is found by work queue (wq) and PASID. It will be used to write the user's completion record that the hardware device is not able to write due to user completion record page fault. An xarray is added to associate the PASID and mm with the struct idxd_user_context so mm can be found by PASID and wq. It is called when handling the completion record fault in a kernel thread context. Switch to the mm using kthread_use_vm() and copy the completion record to the mm via copy_to_user(). Once the copy is completed, switch back to the current mm using kthread_unuse_mm(). Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-9-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: create kmem cache for event log fault itemsDave Jiang2023-04-123-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a kmem cache per device for allocating event log fault context. The context allows an event log entry to be copied and passed to a software workqueue to be processed. Due to each device can have different sized event log entry depending on device type, it's not possible to have a global kmem cache. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-8-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: add per DSA wq workqueue for processing cr faultsDave Jiang2023-04-122-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | Add a workqueue for user submitted completion record fault processing. The workqueue creation and destruction lifetime will be tied to the user sub-driver since it will only be used when the wq is a user type. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-7-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmanegine: idxd: add debugfs for event log dumpDave Jiang2023-04-124-1/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add debugfs entry to dump the content of the event log for debugging. The function will dump all non-zero entries in the event log. It will note which entries are processed and which entries are still pending processing at the time of the dump. The entries may not always be in chronological order due to the log is a circular buffer. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-6-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: add interrupt handling for event logDave Jiang2023-04-122-0/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An event log interrupt is raised in the misc interrupt INTCAUSE register when an event is written by the hardware. Add basic event log processing support to the interrupt handler. The event log is a ring where the hardware owns the tail and the software owns the head. The hardware will advance the tail index when an additional event has been pushed to memory. The software will process the log entry and then advances the head. The log is full when (tail + 1) % log_size = head. The hardware will stop writing when the log is full. The user is expected to create a log size large enough to handle all the expected events. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-5-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: setup event log configurationDave Jiang2023-04-125-4/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add setup of event log feature for supported device. Event log addresses error reporting that was lacking in gen 1 DSA devices where a second error event does not get reported when a first event is pending software handling. The event log allows a circular buffer that the device can push error events to. It is up to the user to create a large enough event log ring in order to capture the expected events. The evl size can be set in the device sysfs attribute. By default 64 entries are supported as minimal when event log is enabled. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-4-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: add event log size sysfs attributeDave Jiang2023-04-124-1/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for changing of the event log size. Event log is a feature added to DSA 2.0 hardware to improve error reporting. It supersedes the SWERROR register on DSA 1.0 hardware and hope to prevent loss of reported errors. The error log size determines how many error entries supported for the device. It can be configured by the user via sysfs attribute. Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-3-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: make misc interrupt one shotDave Jiang2023-04-121-26/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code continuously processes the interrupt as long as the hardware is setting the status bit. There's no reason to do that since the threaded handler will get called again if another interrupt is asserted. Also through testing, it has shown that if a misprogrammed (or malicious) agent can continuously submit descriptors with bad completion record and causes errors to be reported via the misc interrupt. Continuous processing by the thread can cause software hang watchdog to kick off since the thread isn't giving up the CPU. Reported-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230407203143.2189681-2-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: expose IAA CAP register via sysfs knobDave Jiang2023-03-314-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | Add IAA (IAX) capability mask sysfs attribute to expose to applications. The mask provides application knowledge of what capabilities this IAA device supports. This mask is available for IAA 2.0 device or later. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230303213732.3357494-3-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: reformat swerror output to standard Linux bitmap outputDave Jiang2023-03-313-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | SWERROR register is 4 64bit wide registers. Currently the sysfs attribute just outputs 4 64bit hex integers. Convert to output with %*pb format specifier. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230303213732.3357494-2-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: Remove unnecessary aer.h includeBjorn Helgaas2023-03-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | <linux/aer.h> is unused, so remove it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230307192655.874008-3-helgaas@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
* | Merge tag 'iommu-updates-v6.4' of ↵Linus Torvalds2023-04-304-13/+32
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Convert to platform remove callback returning void - Extend changing default domain to normal group - Intel VT-d updates: - Remove VT-d virtual command interface and IOASID - Allow the VT-d driver to support non-PRI IOPF - Remove PASID supervisor request support - Various small and misc cleanups - ARM SMMU updates: - Device-tree binding updates: * Allow Qualcomm GPU SMMUs to accept relevant clock properties * Document Qualcomm 8550 SoC as implementing an MMU-500 * Favour new "qcom,smmu-500" binding for Adreno SMMUs - Fix S2CR quirk detection on non-architectural Qualcomm SMMU implementations - Acknowledge SMMUv3 PRI queue overflow when consuming events - Document (in a comment) why ATS is disabled for bypass streams - AMD IOMMU updates: - 5-level page-table support - NUMA awareness for memory allocations - Unisoc driver: Support for reattaching an existing domain - Rockchip driver: Add missing set_platform_dma_ops callback - Mediatek driver: Adjust the dma-ranges - Various other small fixes and cleanups * tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (82 commits) iommu: Remove iommu_group_get_by_id() iommu: Make iommu_release_device() static iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope() iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn) iommu/vt-d: Remove BUG_ON in map/unmap() iommu/vt-d: Remove BUG_ON when domain->pgd is NULL iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation iommu/vt-d: Remove BUG_ON on checking valid pfn range iommu/vt-d: Make size of operands same in bitwise operations iommu/vt-d: Remove PASID supervisor request support iommu/vt-d: Use non-privileged mode for all PASIDs iommu/vt-d: Remove extern from function prototypes iommu/vt-d: Do not use GFP_ATOMIC when not needed iommu/vt-d: Remove unnecessary checks in iopf disabling path iommu/vt-d: Move PRI handling to IOPF feature path iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path iommu/vt-d: Move iopf code from SVA to IOPF enabling path iommu/vt-d: Allow SVA with device-specific IOPF dmaengine: idxd: Add enable/disable device IOPF feature arm64: dts: mt8186: Add dma-ranges for the parent "soc" node ...
| | \
| | \
| *-. \ Merge branches 'iommu/fixes', 'arm/allwinner', 'arm/exynos', 'arm/mediatek', ↵Joerg Roedel2023-04-144-13/+32
| |\ \ \ | | |_|/ | |/| | | | | | 'arm/omap', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 'unisoc', 'x86/vt-d', 'x86/amd', 'core' and 'platform-remove_new' into next
| | | * iommu: Remove ioasid infrastructureJason Gunthorpe2023-03-311-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has no use anymore, delete it all. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230322200803.869130-8-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | | * iommu/ioasid: Rename INVALID_IOASIDJacob Pan2023-03-314-6/+7
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | INVALID_IOASID and IOMMU_PASID_INVALID are duplicated. Rename INVALID_IOASID and consolidate since we are moving away from IOASID infrastructure. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230322200803.869130-7-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
| | * dmaengine: idxd: Add enable/disable device IOPF featureLu Baolu2023-04-131-6/+25
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iommu subsystem requires IOMMU_DEV_FEAT_IOPF must be enabled before and disabled after IOMMU_DEV_FEAT_SVA, if device's I/O page faults rely on the IOMMU. Add explicit IOMMU_DEV_FEAT_IOPF enabling/disabling in this driver. At present, missing IOPF enabling/disabling doesn't cause any real issue, because the IOMMU driver places the IOPF enabling/disabling in the path of SVA feature handling. But this may change. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230324120234.313643-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* / dmaengine: idxd: use const struct bus_type *Greg Kroah-Hartman2023-03-231-2/+2
|/ | | | | | | | | | | | | | In the functions unbind_store() and bind_store(), a struct bus_type * should be a const one, as the driver core bus functions used by this variable are expecting the pointer to be constant, and these functions do not modify the pointer at all. Cc: dmaengine@vger.kernel.org Acked-by: Vinod Koul <vkoul@kernel.org> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230313182918.1312597-32-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'dmaengine-6.3-rc1' of ↵Linus Torvalds2023-02-244-16/+15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "A new driver, couple of device support and binding conversion along with bunch of driver updates are the main features of this. New hardware support: - TI AM62Ax controller support - Xilinx xdma driver - Qualcomm SM6125, SM8550, QDU1000/QRU1000 GPI controller Updates: - Runtime pm support for at_xdmac driver - IMX sdma binding conversion to yaml and HDMI audio support - IMX mxs binding conversion to yaml" * tag 'dmaengine-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (35 commits) dmaengine: idma64: Update bytes_transferred field dmaengine: imx-sdma: Set DMA channel to be private dmaengine: dw: Move check for paused channel to dwc_get_residue() dmaengine: ptdma: check for null desc before calling pt_cmd_callback dmaengine: dw-axi-dmac: Do not dereference NULL structure dmaengine: idxd: Fix default allowed read buffers value in group dmaengine: sf-pdma: pdma_desc memory leak fix dmaengine: Simplify dmaenginem_async_device_register() function dmaengine: use sysfs_emit() to instead of scnprintf() dmaengine: Make an order in struct dma_device definition dt-bindings: dma: cleanup examples - indentation, lowercase hex dt-bindings: dma: drop unneeded quotes dmaengine: xilinx: xdma: Add user logic interrupt support dmaengine: xilinx: xdma: Add xilinx xdma driver dmaengine: drivers: Use devm_platform_ioremap_resource() dmaengine: at_xdmac: remove empty line dmaengine: at_xdmac: add runtime pm support dmaengine: at_xdmac: align properly function members dmaengine: ppc4xx: Convert to use sysfs_emit()/sysfs_emit_at() APIs dmaengine: sun6i: Set the maximum segment size ...
| * dmaengine: idxd: Fix default allowed read buffers value in groupFenghua Yu2023-02-162-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently default read buffers that is allowed in a group is 0. grpcfg will be configured to max read buffers that IDXD can support if the group's allowed read buffers value is 0. But 0 is an invalid read buffers value and user may get confused when seeing the invalid initial value 0 through sysfs interface. To show only valid allowed read buffers value and eliminate confusion, directly initialize the allowed read buffers to IDXD's max read buffers. User still can change the value through sysfs interface. Suggested-by: Ramesh Thomas <ramesh.thomas@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230127192855.966929-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: Set traffic class values in GRPCFG on DSA 2.0Fenghua Yu2022-12-283-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On DSA/IAX 1.0, TC-A and TC-B in GRPCFG are set as 1 to have best performance and cannot be changed through sysfs knobs unless override option is given. The same values should be set on DSA 2.0 as well. Fixes: ea7c8f598c32 ("dmaengine: idxd: restore traffic class defaults after wq reset") Fixes: ade8a86b512c ("dmaengine: idxd: Set defaults for GRPCFG traffic class") Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20221209172141.562648-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * dmaengine: idxd: Remove the unused function set_completion_address()Jiapeng Chong2022-12-281-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The function set_completion_address is defined in the dma.c file, but not called elsewhere, so remove this unused function. drivers/dma/idxd/dma.c:66:20: warning: unused function 'set_completion_address'. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3416 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20221212033514.5831-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* | Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds2023-02-231-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
| * | mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan2023-02-091-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | dmaengine: idxd: Do not call DMX TX callbacks during workqueue disableReinette Chatre2022-12-281-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On driver unload any pending descriptors are flushed and pending DMA descriptors are explicitly completed: idxd_dmaengine_drv_remove() -> drv_disable_wq() -> idxd_wq_free_irq() -> idxd_flush_pending_descs() -> idxd_dma_complete_txd() With this done during driver unload any remaining descriptor is likely stuck and can be dropped. Even so, the descriptor may still have a callback set that could no longer be accessible. An example of such a problem is when the dmatest fails and the dmatest module is unloaded. The failure of dmatest leaves descriptors with dma_async_tx_descriptor::callback pointing to code that no longer exist. This causes a page fault as below at the time the IDXD driver is unloaded when it attempts to run the callback: BUG: unable to handle page fault for address: ffffffffc0665190 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page Fix this by clearing the callback pointers on the transmit descriptors only when workqueue is disabled. Fixes: 403a2e236538 ("dmaengine: idxd: change MSIX allocation based on per wq activation") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/37d06b772aa7f8863ca50f90930ea2fd80b38fc3.1670452419.git.reinette.chatre@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* | dmaengine: idxd: Prevent use after free on completion memoryReinette Chatre2022-12-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On driver unload any pending descriptors are flushed at the time the interrupt is freed: idxd_dmaengine_drv_remove() -> drv_disable_wq() -> idxd_wq_free_irq() -> idxd_flush_pending_descs(). If there are any descriptors present that need to be flushed this flow triggers a "not present" page fault as below: BUG: unable to handle page fault for address: ff391c97c70c9040 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page The address that triggers the fault is the address of the descriptor that was freed moments earlier via: drv_disable_wq()->idxd_wq_free_resources() Fix the use after free by freeing the descriptors after any possible usage. This is done after idxd_wq_reset() to ensure that the memory remains accessible during possible completion writes by the device. Fixes: 63c14ae6c161 ("dmaengine: idxd: refactor wq driver enable/disable operations") Suggested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/6c4657d9cff0a0a00501a7b928297ac966e9ec9d.1670452419.git.reinette.chatre@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* | dmaengine: idxd: Let probe fail when workqueue cannot be enabledReinette Chatre2022-12-281-2/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The workqueue is enabled when the appropriate driver is loaded and disabled when the driver is removed. When the driver is removed it assumes that the workqueue was enabled successfully and proceeds to free allocations made during workqueue enabling. Failure during workqueue enabling does not prevent the driver from being loaded. This is because the error path within drv_enable_wq() returns success unless a second failure is encountered during the error path. By returning success it is possible to load the driver even if the workqueue cannot be enabled and allocations that do not exist are attempted to be freed during driver remove. Some examples of problematic flows: (a) idxd_dmaengine_drv_probe() -> drv_enable_wq() -> idxd_wq_request_irq(): In above flow, if idxd_wq_request_irq() fails then idxd_wq_unmap_portal() is called on error exit path, but drv_enable_wq() returns 0 because idxd_wq_disable() succeeds. The driver is thus loaded successfully. idxd_dmaengine_drv_remove()->drv_disable_wq()->idxd_wq_unmap_portal() Above flow on driver unload triggers the WARN in devm_iounmap() because the device resource has already been removed during error path of drv_enable_wq(). (b) idxd_dmaengine_drv_probe() -> drv_enable_wq() -> idxd_wq_request_irq(): In above flow, if idxd_wq_request_irq() fails then idxd_wq_init_percpu_ref() is never called to initialize the percpu counter, yet the driver loads successfully because drv_enable_wq() returns 0. idxd_dmaengine_drv_remove()->__idxd_wq_quiesce()->percpu_ref_kill(): Above flow on driver unload triggers a BUG when attempting to drop the initial ref of the uninitialized percpu ref: BUG: kernel NULL pointer dereference, address: 0000000000000010 Fix the drv_enable_wq() error path by returning the original error that indicates failure of workqueue enabling. This ensures that the probe fails when an error is encountered and the driver remove paths are only attempted when the workqueue was enabled successfully. Fixes: 1f2bb40337f0 ("dmaengine: idxd: move wq_enable() to device.c") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/e8d8116e5efa0fd14fadc5adae6ffd319f0e5ff1.1670452419.git.reinette.chatre@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* Merge tag 'dmaengine-6.2-rc1' of ↵Linus Torvalds2022-12-192-1/+68
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "New support: - Qualcomm SDM670, SM6115 and SM6375 GPI controller support - Ingenic JZ4755 dmaengine support - Removal of iop-adma driver Updates: - Tegra support for dma-channel-mask - at_hdmac cleanup and virt-chan support for this driver" * tag 'dmaengine-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (46 commits) dmaengine: Revert "dmaengine: remove s3c24xx driver" dmaengine: tegra: Add support for dma-channel-mask dt-bindings: dmaengine: Add dma-channel-mask to Tegra GPCDMA dmaengine: idxd: Remove linux/msi.h include dt-bindings: dmaengine: qcom: gpi: add compatible for SM6375 dmaengine: idxd: Fix crc_val field for completion record dmaengine: at_hdmac: Convert driver to use virt-dma dmaengine: at_hdmac: Remove unused member of at_dma_chan dmaengine: at_hdmac: Rename "chan_common" to "dma_chan" dmaengine: at_hdmac: Rename "dma_common" to "dma_device" dmaengine: at_hdmac: Use bitfield access macros dmaengine: at_hdmac: Keep register definitions and structures private to at_hdmac.c dmaengine: at_hdmac: Set include entries in alphabetic order dmaengine: at_hdmac: Use pm_ptr() dmaengine: at_hdmac: Use devm_clk_get() dmaengine: at_hdmac: Use devm_platform_ioremap_resource dmaengine: at_hdmac: Use devm_kzalloc() and struct_size() dmaengine: at_hdmac: Introduce atc_get_llis_residue() dmaengine: at_hdmac: s/atc_get_bytes_left/atc_get_residue dmaengine: at_hdmac: Pass residue by address to avoid unnecessary implicit casts ...
| * dmaengine: idxd: Remove linux/msi.h includeThomas Gleixner2022-11-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Nothing in this file needs anything from linux/msi.h Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vinod Koul <vkoul@kernel.org> Cc: dmaengine@vger.kernel.org Link: https://lore.kernel.org/r/20221113202428.573536003@linutronix.de Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * Merge branch 'fixes' into nextVinod Koul2022-11-115-12/+70
| |\ | | | | | | | | | Merge due to at_hdmac driver dependency
| * | dmaengine: idxd: Make read buffer sysfs attributes invisible for Intel IAAXiaochen Shen2022-11-041-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In current code, the following sysfs attributes are exposed to user to show or update the values: max_read_buffers (max_tokens) read_buffer_limit (token_limit) group/read_buffers_allowed (group/tokens_allowed) group/read_buffers_reserved (group/tokens_reserved) group/use_read_buffer_limit (group/use_token_limit) >From Intel IAA spec [1], Intel IAA does not support Read Buffer allocation control. So these sysfs attributes should not be supported on IAA device. Fix this issue by making these sysfs attributes invisible through is_visible() filter when the device is IAA. Add description in the ABI documentation to mention that these attributes are not visible when the device does not support Read Buffer allocation control. [1]: https://cdrdv2.intel.com/v1/dl/getContent/721858 Fixes: fde212e44f45 ("dmaengine: idxd: deprecate token sysfs attributes for read buffers") Fixes: c52ca478233c ("dmaengine: idxd: add configuration component of driver") Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20221022074949.11719-1-xiaochen.shen@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * | dmaengine: idxd: Make max batch size attributes in sysfs invisible for Intel IAAXiaochen Shen2022-10-191-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In current code, dev.max_batch_size and wq.max_batch_size attributes in sysfs are exposed to user to show or update the values. >From Intel IAA spec [1], Intel IAA does not support batch processing. So these sysfs attributes should not be supported on IAA device. Fix this issue by making the attributes of max_batch_size invisible in sysfs through is_visible() filter when the device is IAA. Add description in the ABI documentation to mention that the attributes are not visible when the device does not support batch. [1]: https://cdrdv2.intel.com/v1/dl/getContent/721858 Fixes: e7184b159dd3 ("dmaengine: idxd: add support for configurable max wq batch size") Fixes: c52ca478233c ("dmaengine: idxd: add configuration component of driver") Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20220930201528.18621-3-xiaochen.shen@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* | | Merge tag 'v6.1-rc7' into iommufd.git for-nextJason Gunthorpe2022-12-025-12/+70
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version. The rc fix was done a different way when iommufd patches reworked this code. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | dmaengine: idxd: fix RO device state error after been disabled/resetFengqian Gao2022-11-081-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When IDXD is not configurable, that means its WQ, engine, and group configurations cannot be changed. But it can be disabled and its state should be set as disabled regardless it's configurable or not. Fix this by setting device state IDXD_DEV_DISABLED for read-only device as well in idxd_device_clear_state(). Fixes: cf4ac3fef338 ("dmaengine: idxd: fix lockdep warning on device driver removal") Signed-off-by: Fengqian Gao <fengqian.gao@intel.com> Reviewed-by: Xiaochen Shen <xiaochen.shen@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20220930032835.2290-1-fengqian.gao@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * | dmaengine: idxd: Fix max batch size for Intel IAAXiaochen Shen2022-11-084-6/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | >From Intel IAA spec [1], Intel IAA does not support batch processing. Two batch related default values for IAA are incorrect in current code: (1) The max batch size of device is set during device initialization, that indicates batch is supported. It should be always 0 on IAA. (2) The max batch size of work queue is set to WQ_DEFAULT_MAX_BATCH (32) as the default value regardless of Intel DSA or IAA device during work queue setup and cleanup. It should be always 0 on IAA. Fix the issues by setting the max batch size of device and max batch size of work queue to 0 on IAA device, that means batch is not supported. [1]: https://cdrdv2.intel.com/v1/dl/getContent/721858 Fixes: 23084545dbb0 ("dmaengine: idxd: set max_xfer and max_batch for RO device") Fixes: 92452a72ebdf ("dmaengine: idxd: set defaults for wq configs") Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel data accelerators") Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20220930201528.18621-2-xiaochen.shen@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * | dmaengine: idxd: Do not enable user type Work Queue without Shared Virtual ↵Fenghua Yu2022-10-191-0/+18
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Addressing When the idxd_user_drv driver is bound to a Work Queue (WQ) device without IOMMU or with IOMMU Passthrough without Shared Virtual Addressing (SVA), the application gains direct access to physical memory via the device by programming physical address to a submitted descriptor. This allows direct userspace read and write access to arbitrary physical memory. This is inconsistent with the security goals of a good kernel API. Unlike vfio_pci driver, the IDXD char device driver does not provide any ways to pin user pages and translate the address from user VA to IOVA or PA without IOMMU SVA. Therefore the application has no way to instruct the device to perform DMA function. This makes the char device not usable for normal application usage. Since user type WQ without SVA cannot be used for normal application usage and presents the security issue, bind idxd_user_drv driver and enable user type WQ only when SVA is enabled (i.e. user PASID is enabled). Fixes: 448c3de8ac83 ("dmaengine: idxd: create user driver for wq 'device'") Cc: stable@vger.kernel.org Suggested-by: Arjan Van De Ven <arjan.van.de.ven@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20221014222541.3912195-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* / iommu: Remove SVM_FLAG_SUPERVISOR_MODE supportLu Baolu2022-11-032-26/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current kernel DMA with PASID support is based on the SVA with a flag SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address space to a PASID of the device. The device driver programs the device with kernel virtual address (KVA) for DMA access. There have been security and functional issues with this approach: - The lack of IOTLB synchronization upon kernel page table updates. (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.) - Other than slight more protection, using kernel virtual address (KVA) has little advantage over physical address. There are also no use cases yet where DMA engines need kernel virtual addresses for in-kernel DMA. This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU interface. The device drivers are suggested to handle kernel DMA with PASID through the kernel DMA APIs. The drvdata parameter in iommu_sva_bind_device() and all callbacks is not needed anymore. Cleanup them as well. Link: https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/ Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
* dmaengine: idxd: add configuration for concurrent batch descriptor processingDave Jiang2022-09-294-3/+40
| | | | | | | | | | | | | Add sysfs knob to allow control of the number of batch descriptors that can be concurrently processed by an engine in the group as a fraction of the Maximum Work Descriptors in Progress value specfied in ENGCAP register. This control knob is part of toggle for QoS control. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20220917161222.2835172-6-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* dmaengine: idxd: add configuration for concurrent work descriptor processingDave Jiang2022-09-294-15/+75
| | | | | | | | | | | | | Add sysfs knob to allow control of the number of work descriptors that can be concurrently processed by an engine in the group as a fraction of the Maximum Work Descriptors in Progress value specified in ENGCAP register. This control knob is part of toggle for QoS control. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20220917161222.2835172-5-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>