summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* scsi: ufs: core: mcq: Fix the incorrect OCS value for the device commandStanley Chu2023-06-161-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In MCQ mode, when a device command uses a hardware queue shared with other commands, a race condition may occur in the following scenario: 1. A device command is completed in CQx with CQE entry "e". 2. The interrupt handler copies the "cqe" pointer to "hba->dev_cmd.cqe" and completes "hba->dev_cmd.complete". 3. The "ufshcd_wait_for_dev_cmd()" function is awakened and retrieves the OCS value from "hba->dev_cmd.cqe". However, there is a possibility that the CQE entry "e" will be overwritten by newly completed commands in CQx, resulting in an incorrect OCS value being received by "ufshcd_wait_for_dev_cmd()". To avoid this race condition, the OCS value should be immediately copied to the struct "lrb" of the device command. Then "ufshcd_wait_for_dev_cmd()" can retrieve the OCS value from the struct "lrb". Fixes: 57b1c0ef89ac ("scsi: ufs: core: mcq: Add support to allocate multiple queues") Suggested-by: Can Guo <quic_cang@quicinc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Link: https://lore.kernel.org/r/20230610021553.1213-2-powen.kao@mediatek.com Tested-by: Po-Wen Kao <powen.kao@mediatek.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: block: Improve ioprio value validity checksDamien Le Moal2023-06-161-17/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | The introduction of the macro IOPRIO_PRIO_LEVEL() in commit eca2040972b4 ("scsi: block: ioprio: Clean up interface definition") results in an iopriority level to always be masked using the macro IOPRIO_LEVEL_MASK, and thus to the kernel always seeing an acceptable value for an I/O priority level when checked in ioprio_check_cap(). Before this patch, this function would return an error for some (but not all) invalid values for a level valid range of [0..7]. Restore and improve the detection of invalid priority levels by introducing the inline function ioprio_value() to check an ioprio class, level and hint value before combining these fields into a single value to be used with ioprio_set() or AIOs. If an invalid value for the class, level or hint of an ioprio is detected, ioprio_value() returns an ioprio using the class IOPRIO_CLASS_INVALID, indicating an invalid value and causing ioprio_check_cap() to return -EINVAL. Fixes: 6c913257226a ("scsi: block: Introduce ioprio hints") Fixes: eca2040972b4 ("scsi: block: ioprio: Clean up interface definition") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230608095556.124001-1-dlemoal@kernel.org Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* Merge patch series "UFS host controller driver patches"Martin K. Petersen2023-05-311-1/+0
|\ | | | | | | | | | | | | | | | | | | Bart Van Assche <bvanassche@acm.org> says: Please consider these four UFS host controller driver patches for the next merge window. Link: https://lore.kernel.org/r/20230524203659.1394307-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: ufs: core: Simplify driver shutdownBart Van Assche2023-05-311-1/+0
| | | | | | | | | | | | | | | | | | | | | | All UFS host drivers call ufshcd_shutdown(). Hence, instead of calling ufshcd_shutdown() from the host driver .shutdown() callback, inline that function into ufshcd_wl_shutdown(). Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230524203659.1394307-5-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | Merge patch series "ufs: core: mcq: Add ufshcd_abort() and error handler ↵Martin K. Petersen2023-05-312-5/+23
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | support in MCQ mode" Bao D. Nguyen <quic_nguyenb@quicinc.com> says: This patch series enables support for ufshcd_abort() and error handler in MCQ mode. Link: https://lore.kernel.org/r/cover.1685396241.git.quic_nguyenb@quicinc.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: ufs: mcq: Use ufshcd_mcq_poll_cqe_lock() in MCQ modeBao D. Nguyen2023-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for adding MCQ error handler support, update the MCQ code to use the ufshcd_mcq_poll_cqe_lock() in interrupt context instead of using ufshcd_mcq_poll_cqe_nolock(). This is to keep synchronization between MCQ interrupt and error handler contexts because both need to access the MCQ hardware in separate contexts. Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com> Link: https://lore.kernel.org/r/6ae727ad2a4040469b8f0632b55e0577d80da11b.1685396241.git.quic_nguyenb@quicinc.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Can Guo <quic_cang@quicinc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: ufs: mcq: Add supporting functions for MCQ abortBao D. Nguyen2023-05-312-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add supporting functions to handle UFS abort in MCQ mode. Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com> Link: https://lore.kernel.org/r/d452c5ad62dc863cc067ec82daa0885ec98bd508.1685396241.git.quic_nguyenb@quicinc.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Can Guo <quic_cang@quicinc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: ufs: core: Combine 32-bit command_desc_base_addr_lo/hiBao D. Nguyen2023-05-311-4/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | The UTP command descriptor base address is a 57-bit field in the UTP transfer request descriptor. Combine the two 32-bit command_desc_base_addr_lo/hi fields into a 64-bit for better handling of this field. Signed-off-by: Bao D. Nguyen <quic_nguyenb@quicinc.com> Link: https://lore.kernel.org/r/4e6f7f5a15000cdae77c3014b477264f57bf572c.1685396241.git.quic_nguyenb@quicinc.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Tested-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Can Guo <quic_cang@quicinc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | scsi: ufs: core: Do not open code SZ_xAvri Altman2023-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | Do not open code SZ_x. Signed-off-by: Avri Altman <avri.altman@wdc.com> Link: https://lore.kernel.org/r/20230531070009.4593-1-avri.altman@wdc.com Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Keoseong Park <keosung.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | Merge patch series "ufs: Do not requeue while ungating the clock"Martin K. Petersen2023-05-312-1/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bart Van Assche <bvanassche@acm.org> says: In the traces we recorded while testing zoned storage we noticed that UFS commands are requeued while the clock is being ungated. Command requeueing makes it harder than necessary to preserve the command order. Hence this patch series that modifies the SCSI core and also the UFS driver such that clock ungating does not trigger command requeueing. Link: https://lore.kernel.org/r/20230529202640.11883-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: ufs: Ungate the clock synchronouslyBart Van Assche2023-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ungating the clock asynchronously causes ufshcd_queuecommand() to return SCSI_MLQUEUE_HOST_BUSY and hence causes commands to be requeued. This is suboptimal. Allow ufshcd_queuecommand() to sleep such that clock ungating does not trigger command requeuing. Remove the ufshcd_scsi_block_requests() and ufshcd_scsi_unblock_requests() calls because these are no longer needed. The flush_work(&hba->clk_gating.ungate_work) call is sufficient to make the SCSI core wait for clock ungating to complete. Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230529202640.11883-6-bvanassche@acm.org Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: core: Support setting BLK_MQ_F_BLOCKINGBart Van Assche2023-05-311-0/+6
| |/ | | | | | | | | | | | | | | | | | | | | Prepare for adding code in ufshcd_queuecommand() that may sleep. This patch is similar to a patch posted last year by Mike Christie. See also https://lore.kernel.org/all/20220308003957.123312-2-michael.christie@oracle.com/ Cc: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230529202640.11883-3-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | scsi: core: Trace SCSI sense dataBart Van Assche2023-05-311-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a command fails, SCSI sense data is essential to determine why it failed. Hence make the sense key, ASC and ASCQ codes available in the ftrace output. Cc: Niklas Cassel <niklas.cassel@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.g.garry@oracle.com> Cc: Mike Christie <michael.christie@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230518193159.1166304-3-bvanassche@acm.org Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | Merge patch series "Add Command Duration Limits support"Martin K. Petersen2023-05-226-25/+125
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Niklas Cassel <nks@flawful.org> says: This series adds support for Command Duration Limits. The series is based on linux tag: v6.4-rc1 The series can also be found in git: https://github.com/floatious/linux/commits/cdl-v7 ================= CDL in ATA / SCSI ================= Command Duration Limits is defined in: T13 ATA Command Set - 5 (ACS-5) and T10 SCSI Primary Commands - 6 (SPC-6) respectively (a simpler version of CDL is defined in T10 SPC-5). CDL defines Duration Limits Descriptors (DLD). 7 DLDs for read commands and 7 DLDs for write commands. Simply put, a DLD contains a limit and a policy. A command can specify that a certain limit should be applied by setting the DLD index field (3 bits, so 0-7) in the command itself. The DLD index points to one of the 7 DLDs. DLD index 0 means no descriptor, so no limit. DLD index 1-7 means DLD 1-7. A DLD can have a few different policies, but the two major ones are: -Policy 0xF (abort), command will be completed with command aborted error (ATA) or status CHECK CONDITION (SCSI), with sense data indicating that the command timed out. -Policy 0xD (complete-unavailable), command will be completed without error (ATA) or status GOOD (SCSI), with sense data indicating that the command timed out. Note that the command will not have transferred any data to/from the device when the command timed out, even though the command returned success. Regardless of the CDL policy, in case of a CDL timeout, the I/O will result in a -ETIME error to user-space. The DLDs are defined in the CDL log page(s) and are readable and writable. Reading and writing the CDL DLDs are outside the scope of the kernel. If a user wants to read or write the descriptors, they can do so using a user-space application that sends passthrough commands, such as cdl-tools: https://github.com/westerndigitalcorporation/cdl-tools ================================ The introduction of ioprio hints ================================ What the kernel does provide, is a method to let I/O use one of the CDL DLDs defined in the device. Note that the kernel will simply forward the DLD index to the device, so the kernel currently does not know, nor does it need to know, how the DLDs are defined inside the device. The way that the CDL DLD index is supplied to the kernel is by introducing a new 10 bit "ioprio hint" field within the existing 16 bit ioprio definition. Currently, only 6 out of the 16 ioprio bits are in use, the remaining 10 bits are unused, and are currently explicitly disallowed to be set by the kernel. For now, we only add ioprio hints representing CDL DLD index 1-7. Additional ioprio hints for other QoS features could be defined in the future. A theoretical future work could be to make an I/O scheduler aware of these hints. E.g. for CDL, an I/O scheduler could make use of the duration limit in each descriptor, and take that information into account while scheduling commands. Right now, the ioprio hints will be ignored by the I/O schedulers. ============================== How to use CDL from user-space ============================== Since CDL is mutually exclusive with NCQ priority (see ncq_prio_enable and sas_ncq_prio_enable in Documentation/ABI/testing/sysfs-block-device), CDL has to be explicitly enabled using: echo 1 > /sys/block/$bdev/device/cdl_enable Since the ioprio hints are supplied through the existing I/O priority API, it should be simple for an application to make use of the ioprio hints. It simply has to reuse one of the new macros defined in include/uapi/linux/ioprio.h: IOPRIO_PRIO_HINT() or IOPRIO_PRIO_VALUE_HINT(), and supply one of the new hints defined in include/uapi/linux/ioprio.h: IOPRIO_HINT_DEV_DURATION_LIMIT_[1-7], which indicates that the I/O should use the corresponding CDL DLD index 1-7. By reusing the I/O priority API, the user can both define a DLD to use per AIO (io_uring sqe->ioprio or libaio iocb->aio_reqprio) or per-thread (ioprio_set()). ======= Testing ======= With the following fio patches: https://github.com/floatious/fio/commits/cdl fio adds support for ioprio hints, such that CDL can be tested using e.g.: fio --ioengine=io_uring --cmdprio_percentage=10 --cmdprio_hint=DLD_index A simple way to test is to use a DLD with a very short duration limit, and send large reads. Regardless of the CDL policy, in case of a CDL timeout, the I/O will result in a -ETIME error to user-space. We also provide a CDL test suite located in the cdl-tools repo, see: https://github.com/westerndigitalcorporation/cdl-tools#testing-a-system-command-duration-limits-support We have tested this patch series using: -real hardware -the following QEMU implementation: https://github.com/floatious/qemu/tree/cdl (NOTE: the QEMU implementation requires you to define the CDL policy at compile time, so you currently need to recompile QEMU when switching between policies.) =================== Further information =================== For further information about CDL, see Damien's slides: Presented at SDC 2021: https://www.snia.org/sites/default/files/SDC/2021/pdfs/SNIA-SDC21-LeMoal-Be-On-Time-command-duration-limits-Feature-Support-in%20Linux.pdf Presented at Lund Linux Con 2022: https://drive.google.com/file/d/1I6ChFc0h4JY9qZdO1bY5oCAdYCSZVqWw/view?usp=sharing ================ Changes since V6 ================ -Rebased series on v6.4-rc1. -Picked up Reviewed-by tags from Hannes (Thank you Hannes!) -Picked up Reviewed-by tag from Christoph (Thank you Christoph!) -Changed KernelVersion from 6.4 to 6.5 for new sysfs attributes. For older change logs, see previous patch series versions: https://lore.kernel.org/linux-scsi/20230406113252.41211-1-nks@flawful.org/ https://lore.kernel.org/linux-scsi/20230404182428.715140-1-nks@flawful.org/ https://lore.kernel.org/linux-scsi/20230309215516.3800571-1-niklas.cassel@wdc.com/ https://lore.kernel.org/linux-scsi/20230124190308.127318-1-niklas.cassel@wdc.com/ https://lore.kernel.org/linux-scsi/20230112140412.667308-1-niklas.cassel@wdc.com/ https://lore.kernel.org/linux-scsi/20221208105947.2399894-1-niklas.cassel@wdc.com/ Link: https://lore.kernel.org/r/20230511011356.227789-1-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: ata: libata: Handle completion of CDL commands using policy 0xDNiklas Cassel2023-05-222-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A CDL timeout for policy 0xF is defined as a NCQ error, just with a CDL specific sk/asc/ascq in the sense data. Therefore, the existing code in libata does not need to be modified to handle a policy 0xF CDL timeout. For Command Duration Limits policy 0xD: The device shall complete the command without error with the additional sense code set to DATA CURRENTLY UNAVAILABLE. Since a CDL timeout for policy 0xD is not an error, we cannot use the NCQ Command Error log (10h). Instead, we need to read the Sense Data for Successful NCQ Commands log (0Fh). In the success case, just like in the error case, we cannot simply read a log page from the interrupt handler itself, since reading a log page involves sending a READ LOG DMA EXT or READ LOG EXT command. Therefore, we add a new EH action ATA_EH_GET_SUCCESS_SENSE. When a command completes without error, and when the ATA_SENSE bit is set, this new action is set as pending, and EH is scheduled. This way, similar to the NCQ error case, the log page will be read from EH context. An alternative would have been to add a new kthread or workqueue to handle this. However, extending EH can be done with minimal changes and avoids the need to synchronize a new kthread/workqueue with EH. Co-developed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-20-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: ata: libata: Set read/write commands CDL indexDamien Le Moal2023-05-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For devices supporting the command duration limits feature, translate the dld field of read and write operation to set the command duration limit index field of the command task file when the duration limit feature is enabled. The function ata_set_tf_cdl() is introduced to do this. For unqueued (non NCQ) read and write operations, this function sets the command duration limit index set as the lower 3 bits of the feature field. For queued NCQ read/write commands, the index is set as the lower 3 bits of the auxiliary field. The flag ATA_QCFLAG_HAS_CDL is introduced to indicate that a command taskfile has a non zero cdl field. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-19-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: ata: libata: Add ATA feature control sub-page translationDamien Le Moal2023-05-222-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the ATA feature control sub-page of the control mode page to enable/disable the command duration limits feature using the cdl_ctrl field of the ATA feature control sub-page. Both mode sense and mode select translation are supported. For mode sense, the ata device flag ATA_DFLAG_CDL_ENABLED is used to cache the status of the command duration limits feature. Enabling this feature is done using a SET FEATURES command with a cdl action set to 1 when the page cdl_ctrl field value is 0x2 (T2A and T2B pages supported). If this field is 0, CDL is disabled using the SET FEATURES command with a cdl action set to 0. Since a device CDL and NCQ priority features should not be used simultaneously, ata_mselect_control_ata_feature() returns an error when attempting to enable CDL with the device priority feature enabled. Conversely, the function ata_ncq_prio_enable_store() used to enable the use of the device NCQ priority feature through sysfs is modified to return an error if the device CDL feature is enabled. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-18-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: ata: libata: Detect support for command duration limitsDamien Le Moal2023-05-222-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the supported capabilities identify device data log page to detect if a device supports the command duration limits feature. For devices supporting this feature, set the device flag ATA_DFLAG_CDL. To support SCSI-ATA translation, retrieve the command duration limits log page 18h and cache this page content using the cdl array added to the ata_device data structure. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-15-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: core: Allow enabling and disabling command duration limitsDamien Le Moal2023-05-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the sysfs scsi_device attribute cdl_enable to allow a user to enable or disable a device command duration limits feature. CDL is disabled by default. This feature must be explicitly enabled by a user by setting the cdl_enable attribute to 1. The new function scsi_cdl_enable() does not do anything beside setting the cdl_enable field of struct scsi_device in the case of a (real) SCSI device (e.g. a SAS HDD). For ATA devices, the command duration limits feature needs to be enabled/disabled using the ATA feature sub-page of the control mode page. To do so, the scsi_cdl_enable() function checks if this mode page is supported using scsi_mode_sense(). If it is, scsi_mode_select() is used to enable and disable CDL. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-10-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: core: Detect support for command duration limitsDamien Le Moal2023-05-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce the function scsi_cdl_check() to detect if a device supports command duration limits (CDL). Support for the READ 16, WRITE 16, READ 32 and WRITE 32 commands are checked using the function scsi_report_opcode() to probe the rwcdlp and cdlp bits as they indicate the mode page defining the command duration limits descriptors that apply to the command being tested. If any of these commands support CDL, the field cdl_supported of struct scsi_device is set to 1 to indicate that the device supports CDL. Support for CDL for a device is advertizes through sysfs using the new cdl_supported device attribute. This attribute value is 1 for a device supporting CDL and 0 otherwise. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-9-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: core: Support Service Action in scsi_report_opcode()Damien Le Moal2023-05-221-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The REPORT_SUPPORTED_OPERATION_CODES command allows checking for support of commands that have the same opcode but different service actions, such as READ 32 and WRITE 32. However, the current implementation of scsi_report_opcode() only allows checking an operation code without a service action differentiation. Add the "sa" argument to scsi_report_opcode() to allow passing a service action. If a non-zero service action is specified, the reporting options field value is set to 3 to have the service action field taken into account by the device. If no service action field is specified (zero), the reporting options field is set to 1 as before. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-8-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: core: Support retrieving sub-pages of mode pagesDamien Le Moal2023-05-221-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow scsi_mode_sense() to retrieve sub-pages of mode pages by adding the subpage argument. Change all the current caller sites to specify the subpage 0. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-7-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: core: Allow libata to complete successful commands via EHNiklas Cassel2023-05-221-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In SCSI, we get the sense data as part of the completion, for ATA however, we need to fetch the sense data as an extra step. For an aborted ATA command the sense data is fetched via libata's ->eh_strategy_handler(). For Command Duration Limits policy 0xD: The device shall complete the command without error with the additional sense code set to DATA CURRENTLY UNAVAILABLE. In order to handle this policy in libata, we intend to send a successful command via SCSI EH, and let libata's ->eh_strategy_handler() fetch the sense data for the good command. This is similar to how we handle an aborted ATA command, just that we need to read the Successful NCQ Commands log instead of the NCQ Command Error log. When we get a SATA completion with successful commands, ATA_SENSE will be set, indicating that some commands in the completion have sense data. The sense_valid bitmask in the Sense Data for Successful NCQ Commands log will inform exactly which commands that had sense data, which might be a subset of all the commands that was completed in the same completion. (Yet all will have ATA_SENSE set, since the status is per completion.) The successful commands that have e.g. a "DATA CURRENTLY UNAVAILABLE" sense data will have a SCSI ML byte set, so scsi_eh_flush_done_q() will not set the scmd->result to DID_TIME_OUT for these commands. However, the successful commands that did not have sense data, must not get their result marked as DID_TIME_OUT by SCSI EH. Add a new flag SCMD_FORCE_EH_SUCCESS, which tells SCSI EH to not mark a command as DID_TIME_OUT, even if it has scmd->result == SAM_STAT_GOOD. This will be used by libata in a subsequent commit. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-5-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: block: Introduce BLK_STS_DURATION_LIMITDamien Le Moal2023-05-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce the new block I/O status BLK_STS_DURATION_LIMIT for LLDDs to report command that failed due to a command duration limit being exceeded. This new status is mapped to the ETIME error code to allow users to differentiate "soft" duration limit failures from other more serious hardware related errors. If we compare BLK_STS_DURATION_LIMIT with BLK_STS_TIMEOUT: -BLK_STS_DURATION_LIMIT means that the drive gave a reply indicating that the command duration limit was exceeded before the command could be completed. This I/O status is mapped to ETIME for user space. -BLK_STS_TIMEOUT means that the drive never gave a reply at all. This I/O status is mapped to ETIMEDOUT for user space. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-4-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: block: Introduce ioprio hintsDamien Le Moal2023-05-221-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I/O priorities currently only use 6-bits of the 16-bits ioprio value: the 3-upper bits are used to define up to 8 priority classes (4 of which are valid) and the 3 lower bits of the value are used to define a priority level for the real-time and best-effort class. The remaining 10-bits between the I/O priority class and level are unused, and in fact, cannot be used by the user as doing so would either result in the value being completely ignored, or in an error returned by ioprio_check_cap(). Use these 10-bits of an ioprio value to allow a user to specify I/O hints. An I/O hint is defined as a 10-bitsvalue, allowing up to 1023 different hints to be specified, with the value 0 being reserved as the "no hint" case. An I/O hint can apply to any I/O that specifies a valid priority class other than NONE, regardless of the I/O priority level specified. To do so, the macros IOPRIO_PRIO_HINT() and IOPRIO_PRIO_VALUE_HINT() are introduced in include/uapi/linux/ioprio.h to respectively allow a user to get and set a hint in an ioprio value. To support the ATA and SCSI command duration limits feature, 7 hints are defined: IOPRIO_HINT_DEV_DURATION_LIMIT_1 to IOPRIO_HINT_DEV_DURATION_LIMIT_7, allowing a user to specify which command duration limit descriptor should be applied to the commands serving an I/O. Specifying these hints has for now no effect whatsoever if the target block devices do not support the command duration limits feature. However, in the future, block I/O schedulers can be modified to optimize I/O issuing order based on these hints, even for devices that do not support the command duration limits feature. Given that the 7 duration limits hints defined have no effect on any block layer component, the actual definition of the duration limits implied by these hints remains at the device level. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-3-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: block: ioprio: Clean up interface definitionDamien Le Moal2023-05-221-5/+14
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The I/O priority user interface defines the 16-bits ioprio values as the combination of the upper 3-bits for an I/O priority class and the lower 13-bits as priority data. However, the kernel only uses the lower 3-bits of the priority data to define priority levels for the RT and BE priority classes. The data part of an ioprio value is completely ignored for the IDLE and NONE classes. This is enforced by checks done in ioprio_check_cap(), which is called for all paths that allow defining an I/O priority for I/Os: the per-context ioprio_set() system call, aio interface and io_uring interface. Clarify this fact in the uapi ioprio.h header file and introduce the IOPRIO_PRIO_LEVEL_MASK and IOPRIO_PRIO_LEVEL() macros for users to define and get priority levels in an ioprio value. The coarser macro IOPRIO_PRIO_DATA() is retained for backward compatibility with old applications already using it. There is no functional change introduced with this. In-kernel users of the IOPRIO_PRIO_DATA() macro which are explicitly handling I/O priority data as a priority level are modified to use the new IOPRIO_PRIO_LEVEL() macro without any functional change. Since f2fs is the only user of this macro not explicitly using that value as a priority level, it is left unchanged. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-2-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | Merge patch series "Use block pr_ops in LIO"Martin K. Petersen2023-05-227-13/+96
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mike Christie <michael.christie@oracle.com> says: The patches in this thread allow us to use the block pr_ops with LIO's target_core_iblock module to support cluster applications in VMs. They were built over Linus's tree. They also apply over linux-next and Martin's tree and Jens's trees. Currently, to use windows clustering or linux clustering (pacemaker + cluster labs scsi fence agents) in VMs with LIO and vhost-scsi, you have to use tcmu or pscsi or use a cluster aware FS/framework for the LIO pr file. Setting up a cluster FS/framework is pain and waste when your real backend device is already a distributed device, and pscsi and tcmu are nice for specific use cases, but iblock gives you the best performance and allows you to use stacked devices like dm-multipath. So these patches allow iblock to work like pscsi/tcmu where they can pass a PR command to the backend module. And then iblock will use the pr_ops to pass the PR command to the real devices similar to what we do for unmap today. The patches are separated in the following groups: Patch 1 - 2: - Add block layer callouts for reading reservations and rename reservation error code. Patch 3 - 5: - SCSI support for new callouts. Patch 6: - DM support for new callouts. Patch 7 - 13: - NVMe support for new callouts. Patch 14 - 18: - LIO support for new callouts. This patchset has been tested with the libiscsi PGR ops and with window's failover cluster verification test. Note that for scsi backend devices we need this patchset: https://lore.kernel.org/linux-scsi/20230123221046.125483-1-michael.christie@oracle.com/T/#m4834a643ffb5bac2529d65d40906d3cfbdd9b1b7 to handle UAs. To reduce the size of this patchset that's being done separately to make reviewing easier. And to make merging easier this patchset and the one above do not have any conflicts so can be merged in different trees. Link: https://lore.kernel.org/r/20230407200551.12660-1-michael.christie@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: target: Pass struct target_opcode_descriptor to enabledMike Christie2023-04-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iblock pr_ops support does not support commands that require port or I_T Nexus info. This adds a struct target_opcode_descriptor as an argument to the enabled callout so we can still have the common tcm_is_pr_enabled and tcm_is_scsi2_reservations_enabled functions and also determine if the command is supported based on the command and service action and device settings. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-17-michael.christie@oracle.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: target: Allow backends to hook into PR handlingMike Christie2023-04-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the cases where you want to export a device to a VM via a single I_T nexus and want to passthrough the PR handling to the physical/real device you have to use pscsi or tcmu. Both are good for specific uses however for the case where you want good performance, and are not using SCSI devices directly (using DM/MD RAID or multipath devices) then we are out of luck. The following patches allow iblock to mimimally hook into the LIO PR code and then pass the PR handling to the physical device. Note that like with the tcmu an pscsi cases it's only supported when you export the device via one I_T nexus. This patch adds the initial LIO callouts. The next patch will modify iblock. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-16-michael.christie@oracle.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: target: Rename sbc_ops to exec_cmd_opsMike Christie2023-04-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The next patches allow us to call the block layer's pr_ops from the backends. This will require allowing the backends to hook into the cmd processing for SPC commands, so this renames sbc_ops to a more generic exec_cmd_ops. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-15-michael.christie@oracle.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * nvme: Add a nvme_pr_type enumMike Christie2023-04-111-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The next patch adds support to report the reservation type, so we need to be able to convert from the NVMe PR value we get from the device to the linux block layer PR value that will be returned to callers. To prepare for that, this patch adds a nvme_pr_type enum and renames the nvme_pr_type function. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-13-michael.christie@oracle.com Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * nvme: Add pr_ops read_keys supportMike Christie2023-04-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the pr_ops read_keys callout by calling the NVMe Reservation Report helper, then parsing that info to get the controller's registered keys. Because the callout is only used in the kernel where the callers, like LIO, do not know about controller/host IDs, the callout just returns the registered keys which is required by the SCSI PR in READ KEYS command. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-12-michael.christie@oracle.com Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * nvme: Fix reservation status related structsMike Christie2023-04-111-8/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the following issues with the reservation status structs: 1. resv10 is bytes 23:10 so it should be 14 bytes. 2. regctl_ds only supports 64 bit host IDs. These are not currently used, but will be in this patchset which adds support for the reservation report command. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-8-michael.christie@oracle.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: Add support for block PR read keys/reservationMike Christie2023-04-112-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds support in sd.c for the block PR read keys and read reservation callouts, so upper layers like LIO can get the PR info that's been setup using the existing pr callouts and return it to initiators. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-6-michael.christie@oracle.com Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: Move sd_pr_type to scsi_commonMike Christie2023-04-111-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LIO is going to want to do the same block to/from SCSI pr types as sd.c so this moves the sd_pr_type helper to scsi_common and renames it. The next patch will then also add a helper to go from the SCSI value to the block one for use with PERSISTENT_RESERVE_IN commands. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-5-michael.christie@oracle.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * block: Rename BLK_STS_NEXUS to BLK_STS_RESV_CONFLICTMike Christie2023-04-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BLK_STS_NEXUS is used for NVMe/SCSI reservation conflicts and DASD's locking feature which works similar to NVMe/SCSI reservations where a host can get a lock on a device and when the lock is taken it will get failures. This patch renames BLK_STS_NEXUS so it better reflects this type of use. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-3-michael.christie@oracle.com Acked-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * block: Add PR callouts for read keys and reservationMike Christie2023-04-111-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add callouts for reading keys and reservations. This allows LIO to support the READ_KEYS and READ_RESERVATION commands so it can export devices to VMs for software like windows clustering. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-2-michael.christie@oracle.com Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | Merge tag 'mailbox-v6.4' of ↵Linus Torvalds2023-05-072-0/+124
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox updates from Jassi Brar: - mailbox api: allow direct registration to a channel and convert omap and pcc to use mbox_bind_client - omap and hi6220 : use of_property_read_bool - test: fix double-free and use spinlock header - rockchip and bcm-pdc: drop of_match_ptr - mpfs: change config symbol - mediatek gce: support MT6795 - qcom apcs: consolidate of_device_id and support IPQ9574 * tag 'mailbox-v6.4' of git://git.linaro.org/landing-teams/working/fujitsu/integration: dt-bindings: mailbox: qcom: add compatible for IPQ9574 SoC mailbox: qcom-apcs-ipc: do not grow the of_device_id dt-bindings: mailbox: qcom,apcs-kpss-global: use fallbacks for few variants dt-bindings: mailbox: mediatek,gce-mailbox: Add support for MT6795 mailbox: mpfs: convert SOC_MICROCHIP_POLARFIRE to ARCH_MICROCHIP_POLARFIRE mailbox: bcm-pdc: drop of_match_ptr for ID table mailbox: rockchip: drop of_match_ptr for ID table mailbox: mailbox-test: Fix potential double-free in mbox_test_message_write() mailbox: mailbox-test: Explicitly include header for spinlock support mailbox: Use of_property_read_bool() for boolean properties mailbox: pcc: Use mbox_bind_client mailbox: omap: Use mbox_bind_client mailbox: Allow direct registration to a channel
| * | dt-bindings: mailbox: mediatek,gce-mailbox: Add support for MT6795AngeloGioacchino Del Regno2023-05-041-0/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add a compatible string for the MT6795 Helio X10 SoC using MT8173 binding and add a header for the MT6795's GCE mailbox. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
| * | mailbox: Allow direct registration to a channelElliot Berman2023-04-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support virtual mailbox controllers and clients which are not platform devices or come from the devicetree by allowing them to match client to channel via some other mechanism. Tested-by: Sudeep Holla <sudeep.holla@arm.com> (pcc) Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
* | | Merge tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linuxLinus Torvalds2023-05-071-1/+6
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more io_uring updates from Jens Axboe: "Nothing major in here, just two different parts: - A small series from Breno that enables passing the full SQE down for ->uring_cmd(). This is a prerequisite for enabling full network socket operations. Queued up a bit late because of some stylistic concerns that got resolved, would be nice to have this in 6.4-rc1 so the dependent work will be easier to handle for 6.5. - Fix for the huge page coalescing, which was a regression introduced in the 6.3 kernel release (Tobias)" * tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux: io_uring: Remove unnecessary BUILD_BUG_ON io_uring: Pass whole sqe to commands io_uring: Create a helper to return the SQE size io_uring/rsrc: check for nonconsecutive pages
| * | | io_uring: Pass whole sqe to commandsBreno Leitao2023-05-041-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently uring CMD operation relies on having large SQEs, but future operations might want to use normal SQE. The io_uring_cmd currently only saves the payload (cmd) part of the SQE, but, for commands that use normal SQE size, it might be necessary to access the initial SQE fields outside of the payload/cmd block. So, saves the whole SQE other than just the pdu. This changes slightly how the io_uring_cmd works, since the cmd structures and callbacks are not opaque to io_uring anymore. I.e, the callbacks can look at the SQE entries, not only, in the cmd structure. The main advantage is that we don't need to create custom structures for simple commands. Creates io_uring_sqe_cmd() that returns the cmd private data as a null pointer and avoids casting in the callee side. Also, make most of ublk_drv's sqe->cmd priv structure into const, and use io_uring_sqe_cmd() to get the private structure, removing the unwanted cast. (There is one case where the cast is still needed since the header->{len,addr} is updated in the private structure) Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20230504121856.904491-3-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linuxLinus Torvalds2023-05-062-8/+20
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block updates from Jens Axboe: - MD pull request via Song: - Improve raid5 sequential IO performance on spinning disks, which fixes a regression since v6.0 (Jan Kara) - Fix bitmap offset types, which fixes an issue introduced in this merge window (Jonathan Derrick) - Cleanup of hweight type used for cgroup writeback (Maxim) - Fix a regression with the "has_submit_bio" changes across partitions (Ming) - Cleanup of QUEUE_FLAG_ADD_RANDOM clearing. We used to set this flag on queues non blk-mq queues, and hence some drivers clear it unconditionally. Since all of these have since been converted to true blk-mq drivers, drop the useless clear as the bit is not set (Chaitanya) - Fix the flags being set in a bio for a flush for drbd (Christoph) - Cleanup and deduplication of the code handling setting block device capacity (Damien) - Fix for ublk handling IO timeouts (Ming) - Fix for a regression in blk-cgroup teardown (Tao) - NBD documentation and code fixes (Eric) - Convert blk-integrity to using device_attributes rather than a second kobject to manage lifetimes (Thomas) * tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linux: ublk: add timeout handler drbd: correctly submit flush bio on barrier mailmap: add mailmap entries for Jens Axboe block: Skip destroyed blkg when restart in blkg_destroy_all() writeback: fix call of incorrect macro md: Fix bitmap offset type in sb writer md/raid5: Improve performance for sequential IO docs nbd: userspace NBD now favors github over sourceforge block nbd: use req.cookie instead of req.handle uapi nbd: add cookie alias to handle uapi nbd: improve doc links to userspace spec blk-integrity: register sysfs attributes on struct device blk-integrity: convert to struct device_attribute blk-integrity: use sysfs_emit block/drivers: remove dead clear of random flag block: sync part's ->bd_has_submit_bio with disk's block: Cleanup set_capacity()/bdev_set_nr_sectors()
| * | | | uapi nbd: add cookie alias to handleEric Blake2023-04-271-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The uapi <linux/nbd.h> header declares a 'char handle[8]' per request; which is overloaded in English (are you referring to "handle" the verb, such as handling a signal or writing a callback handler, or "handle" the noun, the value used in a lookup table to correlate a response back to the request). Many user-space NBD implementations (both servers and clients) have instead used 'uint64_t cookie' or similar, as it is easier to directly assign an integer than to futz around with memcpy. In fact, upstream documentation is now encouraging this shift in terminology: https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b Accomplish this by use of an anonymous union to provide the alias for anyone getting the definition from the uapi; this does not break existing clients, while exposing the nicer name for those who prefer it. Note that block/nbd.c still uses the term handle (in fact, it actually combines a 32-bit cookie and a 32-bit tag into the 64-bit handle), but that internal usage is not changed by the public uapi, since no compliant NBD server has any reason to inspect or alter the 64 bits sent over the socket. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20230410180611.1051618-3-eblake@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | uapi nbd: improve doc links to userspace specEric Blake2023-04-271-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The uapi <linux/nbd.h> header intentionally documents only the NBD server features that the kernel module will utilize as a client. But while it already had one mention of skipped bits due to userspace extensions, it did not actually direct the reader to the canonical source to learn about those extensions. While touching comments, fix an outdated reference that listed only READ and WRITE as commands. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20230410180611.1051618-2-eblake@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | blk-integrity: register sysfs attributes on struct deviceThomas Weißschuh2023-04-261-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "integrity" kobject only acted as a holder for static sysfs entries. It also was embedded into struct gendisk without managing it, violating assumptions of the driver core. Instead register the sysfs entries directly onto the struct device. Also drop the now unused member integrity_kobj from struct gendisk. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230309-kobj_release-gendisk_integrity-v3-3-ceccb4493c46@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | Merge tag 'net-6.4-rc1' of ↵Linus Torvalds2023-05-053-10/+13
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - sched: act_pedit: free pedit keys on bail from offset check Current release - new code bugs: - pds_core: - Kconfig fixes (DEBUGFS and AUXILIARY_BUS) - fix mutex double unlock in error path Previous releases - regressions: - sched: cls_api: remove block_cb from driver_list before freeing - nf_tables: fix ct untracked match breakage - eth: mtk_eth_soc: drop generic vlan rx offload - sched: flower: fix error handler on replace Previous releases - always broken: - tcp: fix skb_copy_ubufs() vs BIG TCP - ipv6: fix skb hash for some RST packets - af_packet: don't send zero-byte data in packet_sendmsg_spkt() - rxrpc: timeout handling fixes after moving client call connection to the I/O thread - ixgbe: fix panic during XDP_TX with > 64 CPUs - igc: RMW the SRRCTL register to prevent losing timestamp config - dsa: mt7530: fix corrupt frames using TRGMII on 40 MHz XTAL MT7621 - r8152: - fix flow control issue of RTL8156A - fix the poor throughput for 2.5G devices - move setting r8153b_rx_agg_chg_indicate() to fix coalescing - enable autosuspend - ncsi: clear Tx enable mode when handling a Config required AEN - octeontx2-pf: macsec: fixes for CN10KB ASIC rev Misc: - 9p: remove INET dependency" * tag 'net-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) net: bcmgenet: Remove phy_stop() from bcmgenet_netif_stop() pds_core: fix mutex double unlock in error path net/sched: flower: fix error handler on replace Revert "net/sched: flower: Fix wrong handle assignment during filter change" net/sched: flower: fix filter idr initialization net: fec: correct the counting of XDP sent frames bonding: add xdp_features support net: enetc: check the index of the SFI rather than the handle sfc: Add back mailing list virtio_net: suppress cpu stall when free_unused_bufs ice: block LAN in case of VF to VF offload net: dsa: mt7530: fix network connectivity with multiple CPU ports net: dsa: mt7530: fix corrupt frames using trgmii on 40 MHz XTAL MT7621 9p: Remove INET dependency netfilter: nf_tables: fix ct untracked match breakage af_packet: Don't send zero-byte data in packet_sendmsg_spkt(). igc: read before write to SRRCTL register pds_core: add AUXILIARY_BUS and NET_DEVLINK to Kconfig pds_core: remove CONFIG_DEBUG_FS from makefile ionic: catch failure from devlink_alloc ...
| * \ \ \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextJakub Kicinski2023-05-051-0/+1
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a fix which landed in net-next, pull it in along with the couple of minor cleanups. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | | | bonding: add xdp_features supportLorenzo Bianconi2023-05-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce xdp_features support for bonding driver according to the slave devices attached to the master one. xdp_features is required whenever we want to xdp_redirect traffic into a bond device and then into selected slaves attached to it. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Jussi Maki <joamaki@gmail.com> Tested-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | netfilter: nf_tables: deactivate anonymous set from preparation phasePablo Neira Ayuso2023-05-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Toggle deleted anonymous sets as inactive in the next generation, so users cannot perform any update on it. Clear the generation bitmask in case the transaction is aborted. The following KASAN splat shows a set element deletion for a bound anonymous set that has been already removed in the same transaction. [ 64.921510] ================================================================== [ 64.923123] BUG: KASAN: wild-memory-access in nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.924745] Write of size 8 at addr dead000000000122 by task test/890 [ 64.927903] CPU: 3 PID: 890 Comm: test Not tainted 6.3.0+ #253 [ 64.931120] Call Trace: [ 64.932699] <TASK> [ 64.934292] dump_stack_lvl+0x33/0x50 [ 64.935908] ? nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.937551] kasan_report+0xda/0x120 [ 64.939186] ? nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.940814] nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.942452] ? __kasan_slab_alloc+0x2d/0x60 [ 64.944070] ? nf_tables_setelem_notify+0x190/0x190 [nf_tables] [ 64.945710] ? kasan_set_track+0x21/0x30 [ 64.947323] nfnetlink_rcv_batch+0x709/0xd90 [nfnetlink] [ 64.948898] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>