summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* soundwire: stream: use FIELD_{GET|PREP}Vinod Koul2020-09-041-8/+5
| | | | | | | | | | use FIELD_{GET|PREP} in stream code to get/set field values instead of open coding masks and shift operations. Signed-off-by: Vinod Koul <vkoul@kernel.org> Tested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200903114504.1202143-5-vkoul@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: slave: use SDW_DISCO_LINK_ID()Vinod Koul2020-09-041-1/+1
| | | | | | | | | | use SDW_DISCO_LINK_ID() in slave code to extract field values instead of open coding masks and shift operations to extract link_id Signed-off-by: Vinod Koul <vkoul@kernel.org> Tested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200903114504.1202143-4-vkoul@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: bus: use FIELD_GET()Vinod Koul2020-09-041-3/+3
| | | | | | | | | | use FIELD_GET() in bus code to extract field values instead of open coding masks and shift operations. Signed-off-by: Vinod Koul <vkoul@kernel.org> Tested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200903114504.1202143-3-vkoul@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: define and use addr bit masksVinod Koul2020-09-041-7/+14
| | | | | | | | | | | Soundwire addr is a 52bit value encoding link, version, unique id, mfg id, part id and class id. Define bit masks for these and use FIELD_GET() to extract these fields. Signed-off-by: Vinod Koul <vkoul@kernel.org> Tested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200903114504.1202143-2-vkoul@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: don't manage link power individuallyPierre-Louis Bossart2020-09-031-24/+46
| | | | | | | | | | | | | | Each link has separate power controls, but experimental results show we need to use an all-or-none approach to the link power management. This change has marginal power impacts, the DSP needs to be powered anyways before SoundWire links can be powered, and even when powered a link can be in clock-stopped mode. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200901150556.19432-11-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: pass link_mask information to each masterPierre-Louis Bossart2020-09-032-0/+3
| | | | | | | | | | | While the hardware exposes independent bits to power-up each master, the recommended sequence is to power all links or none. Idle links can still use the clock stop mode while the master is powered. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200901150556.19432-10-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: add error log for clock-stop invalid configsPierre-Louis Bossart2020-09-031-0/+5
| | | | | | | | | | | | Detect cases where the clock is assumed to be stopped but the IP is not in the relevant state. There is no real way to recover here, but adding an error log can help detect bad programming sequences or race conditions. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200901150556.19432-9-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: stream: enable hw_sync as needed by hardwarePierre-Louis Bossart2020-09-031-6/+9
| | | | | | | | | | Use platform-specific information to decide when to use hw_sync, not only a number of links > 1. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200901150556.19432-8-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: add multi-link hw_synchronization informationPierre-Louis Bossart2020-09-031-0/+7
| | | | | | | | | set the flags as required by hardware implementation Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200901150556.19432-7-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: bus: update multi-link definition with hw sync detailsPierre-Louis Bossart2020-09-031-0/+6
| | | | | | | | | | | | | | | | | | Hardware-based synchronization is typically required when the bus->multi_link flag is set. On Intel platforms, when the Cadence IP is configured in 'Multi Master Mode', the hardware synchronization is required even when a stream only uses a single segment. The existing code only deal with hardware synchronization when a stream uses more than one segment so to remain backwards compatible we add a configuration threshold. For Intel cases this threshold will be set to one, other platforms may be able to use the SSP-based sync in those cases. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200901150556.19432-6-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: add missing support for all clock stop modesPierre-Louis Bossart2020-09-031-10/+39
| | | | | | | | | | | Deal with the BUS_RESET case, which is the default. The only change is to add support for the exit sequence using the syncArm/syncGo mode for the exit reset sequence. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200901150556.19432-5-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: add multi-link supportPierre-Louis Bossart2020-09-031-10/+110
| | | | | | | | | | | | The multi-link support is enabled with a hardware gsync signal connecting all links. All commands and operations which typically are handled on an SSP boundary will be deferred further and enabled across all links with the 'syncGo' sequence. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200901150556.19432-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: ignore software command retriesPierre-Louis Bossart2020-09-031-0/+5
| | | | | | | | | | with multiple links synchronized in hardware, retrying commands in software is not recommended. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200901150556.19432-3-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: disable shim wake on suspendPierre-Louis Bossart2020-09-031-0/+18
| | | | | | | | | | | | If we enabled the clock stop mode and suspend, we need to disable the shim wake. We do so only if the parent is pm_runtime active due to power rail dependencies. GitHub issue: https://github.com/thesofproject/linux/issues/1678 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200901150556.19432-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: fix port_ready[] dynamic allocation in mipi_discoPierre-Louis Bossart2020-09-033-18/+6
| | | | | | | | | | | | | | | | | | | | The existing code allocates memory for the total number of ports. This only works if the ports are contiguous, but will break if e.g. a Devices uses port0, 1, and 14. The port_ready[] array would contain 3 elements, which would lead to an out-of-bounds access. Conversely in other cases, the wrong port index would be used leading to timeouts on prepare. This can be fixed by allocating for the worst-case of 15 ports (DP0..DP14). In addition since the number is now fixed, we can use an array instead of a dynamic allocation. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@linux.intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Link: https://lore.kernel.org/r/20200831134318.11443-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: add definition for maximum number of portsPierre-Louis Bossart2020-09-031-1/+2
| | | | | | | | | | | A Device may have at most 15 physical ports (DP0, DP1..DP14). Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@linux.intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Link: https://lore.kernel.org/r/20200831134318.11443-3-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* ASoC: codecs: soundwire: remove port_ready[] usage from codecs.Pierre-Louis Bossart2020-09-036-101/+6
| | | | | | | | | | | | | | port_ready will be changed to a fixed array in sdw.h and all initialization work will be done at soundwire side in the following patch. So remove them from codec drivers. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@linux.intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Acked-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20200831134318.11443-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: fix intel_suspend/resume defined but not used warningBard Liao2020-08-261-2/+2
| | | | | | | | | | | | | | | | | | | | | When CONFIG_PM_SLEEP is not defined, GCC throws compilation warnings: drivers/soundwire/intel.c:1799:12: warning: ‘intel_resume’ defined but not used [-Wunused-function] static int intel_resume(struct device *dev) ^~~~~~~~~~~~ drivers/soundwire/intel.c:1683:12: warning: ‘intel_suspend’ defined but not used [-Wunused-function] static int intel_suspend(struct device *dev) ^~~~~~~~~~~~~ Fix by using __maybe_unused macro. Suggested-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200824133234.28115-1-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: refine runtime pm for SDW_INTEL_CLK_STOP_BUS_RESETRander Wang2020-08-181-2/+17
| | | | | | | | | | | | | | | | When all the links are suspended, the HDaudio controller may suspend and the power rails to the SoundWire IP may be disabled, requiring a complete re-initialization/enumeration on resume. However, if one or more Masters remained active, the HDaudio controller will remain active and the power rails will remain enabled. As a result, during the link resume step we can check if the context was preserved by verifying if the clock was stopped, and avoid doing a complete bus reset and re-enumeration. Signed-off-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-13-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: support clock_stop mode without quirksPierre-Louis Bossart2020-08-181-1/+20
| | | | | | | | | | | In this mode, on restart the bus restarts immediately, the Slaves remain synchronized and all context is kept intact. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-12-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel_init: handle power rail dependencies for clock stop modePierre-Louis Bossart2020-08-181-0/+13
| | | | | | | | | | | | | | | | | When none of the clock stop quirks is specified, the Master IP will assume the context is preserved and will not reset the Bus and restart enumeration. Due to power rail dependencies, the HDaudio controller needs to remain powered and prevented from executing its pm_runtime suspend routine. This choice of course has a power impact, and this mode should only be selected when latency requirements are critical or the parent device can enter D0ix modes. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-11-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: add CLK_STOP_NOT_ALLOWED supportPierre-Louis Bossart2020-08-181-0/+20
| | | | | | | | | | In case the clock needs to keep running, we need to prevent the Master from entering pm_runtime suspend. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-10-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: add CLK_STOP_BUS_RESET supportRander Wang2020-08-181-0/+44
| | | | | | | | | | | | | | | | Move existing pm_runtime suspend under the CLK_STOP_TEARDOWN case. In this mode the Master IP will lose all context but in-band wakes are supported. On pm_runtime resume a complete re-enumeration will be performed after a bus reset. Signed-off-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-9-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: add CLK_STOP_TEARDOWN for pm_runtime suspendPierre-Louis Bossart2020-08-183-31/+54
| | | | | | | | | | | | Now that we have options, add support for TEARDOWN mode (same functionality as existing code) All other modes will be added in follow-up patches. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-8-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: pm_runtime idle schedulingPierre-Louis Bossart2020-08-181-2/+24
| | | | | | | | | | | | | | Add quirk and pm_runtime idle scheduling to let the Master suspend if no Slaves become attached. This can happen when a link is not marked as disabled and has devices exposed in the DSDT, if the power is controlled by sideband means or the link includes a pluggable connector. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Rander Wang <rander.wang@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-7-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: reinitialize IP+DSP in .prepare(), but only when resumingBard Liao2020-08-182-1/+74
| | | | | | | | | | | | | | | | | | | | | The .prepare() callback is invoked for normal streaming, underflows or during the system resume transition. In the latter case, the context for the ALH PDIs is lost, and the DSP is not initialized properly either, but the bus parameters don't need to be recomputed. Conversely, when doing a regular .prepare() during an underflow, the ALH/SHIM registers shall not be changed as the hardware cannot be reprogrammed after the DMA started (hardware spec requirement). This patch adds storage of PDI and hw_params in the DAI dma context, and the difference between the types of .prepare() usages is handled via a simple boolean, updated when suspending, and tested for in the .prepare() case. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-6-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: call helper to reset Slave states on resumePierre-Louis Bossart2020-08-181-0/+12
| | | | | | | | | | | | | | This helps make sure they are all UNATTACHED and reset the state machines. At the moment we perform a bus reset both for system resume and pm_runtime resume, this will be modified when clock-stop mode is supported Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-5-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: fix race condition on system resumePierre-Louis Bossart2020-08-181-0/+12
| | | | | | | | | | | | | | | | | | | Previous patches took care of the case where the master device is pm_runtime 'suspended' when a system suspend occurs. In the case where the master device was not suspended, e.g. if suspend occurred while streaming audio, Intel validation noticed a race condition: the pm_runtime suspend may conflict with the enumeration started by the system resume. This can be simply fixed by updating the status before exiting system resume. GitHub issue: https://github.com/thesofproject/linux/issues/1482 Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: reset pm_runtime status during system resumePierre-Louis Bossart2020-08-181-0/+16
| | | | | | | | | | | | | | | | | | The system resume does the entire bus re-initialization and brings it to full-power. If the device was pm_runtime suspended, there is no need to run the pm_runtime resume sequence after the system runtime. Follow the documentation from runtime_pm.rst, and conditionally disable, set_active and re-enable the device on system resume. Note that pm_runtime_suspended() is used instead of pm_runtime_status_suspended() so that we can deal with the case where pm_runtime is disabled. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-3-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: add pm_runtime supportPierre-Louis Bossart2020-08-182-5/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add basic hooks in DAI .startup and .shutdown callbacks. The SoundWire IP should be powered between those two calls. The power dependencies between SoundWire and DSP are handled with the parent/child relationship, before the SoundWire master device becomes active the parent device will become active and power-up the shared rails. For now the strategy is to rely on complete enumeration when the device becomes active, so the code is a copy/paste of the sequence for system suspend/resume. In future patches, the strategy will optionally be to rely on clock stop if the enumeration time is prohibitive or when the devices connected to a link can signal a wake. A module parameter is added to make integration of new Slave devices easier, to e.g. keep the device active or prevent clock-stop. Note that we need to we have to disable runtime pm before device unregister, otherwise we will see "Failed to power up link: -11" error on module remove test. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200817152923.3259-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: intel: Add basic power management supportPierre-Louis Bossart2020-08-171-2/+79
| | | | | | | | | | | | | Implement suspend/resume capabilities (not runtime_pm for now) The resume part is essentially a full-blown re-enumeration. When S0ix is supported, we will select clock stop mode when the ACPI target state is S0, and tear down the link for S3. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200721203723.18305-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* soundwire: master: enable pm runtimeBard Liao2020-08-171-0/+2
| | | | | | | | | | | | | | The hierarchy of soundwire devices is platform device -> M device -> S device. A S device is physically attached on the platform device. So the platform device should be resumed when a S device is resumed. As the bridge of platform device and S device, we have to implement runtime pm on M driver. We have set runtime pm ops in M driver already, but still need to enable runtime pm. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200726215945.3119-1-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* Linux 5.9-rc1v5.9-rc1Linus Torvalds2020-08-161-2/+2
|
* Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-blockLinus Torvalds2020-08-164-156/+409
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: "A few differerent things in here. Seems like syzbot got some more io_uring bits wired up, and we got a handful of reports and the associated fixes are in here. General fixes too, and a lot of them marked for stable. Lastly, a bit of fallout from the async buffered reads, where we now more easily trigger short reads. Some applications don't really like that, so the io_read() code now handles short reads internally, and got a cleanup along the way so that it's now easier to read (and documented). We're now passing tests that failed before" * tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block: io_uring: short circuit -EAGAIN for blocking read attempt io_uring: sanitize double poll handling io_uring: internally retry short reads io_uring: retain iov_iter state over io_read/io_write calls task_work: only grab task signal lock when needed io_uring: enable lookup of links holding inflight files io_uring: fail poll arm on queue proc failure io_uring: hold 'ctx' reference around task_work queue + execute fs: RWF_NOWAIT should imply IOCB_NOIO io_uring: defer file table grabbing request cleanup for locked requests io_uring: add missing REQ_F_COMP_LOCKED for nested requests io_uring: fix recursive completion locking on oveflow flush io_uring: use TWA_SIGNAL for task_work uncondtionally io_uring: account locked memory before potential error case io_uring: set ctx sq/cq entry count earlier io_uring: Fix NULL pointer dereference in loop_rw_iter() io_uring: add comments on how the async buffered read retry works io_uring: io_async_buf_func() need not test page bit
| * io_uring: short circuit -EAGAIN for blocking read attemptJens Axboe2020-08-151-1/+4
| | | | | | | | | | | | | | | | | | | | | | One case was missed in the short IO retry handling, and that's hitting -EAGAIN on a blocking attempt read (eg from io-wq context). This is a problem on sockets that are marked as non-blocking when created, they don't carry any REQ_F_NOWAIT information to help us terminate them instead of perpetually retrying. Fixes: 227c0c9673d8 ("io_uring: internally retry short reads") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: sanitize double poll handlingJens Axboe2020-08-151-9/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a bit of confusion on the matching pairs of poll vs double poll, depending on if the request is a pure poll (IORING_OP_POLL_ADD) or poll driven retry. Add io_poll_get_double() that returns the double poll waitqueue, if any, and io_poll_get_single() that returns the original poll waitqueue. With that, remove the argument to io_poll_remove_double(). Finally ensure that wait->private is cleared once the double poll handler has run, so that remove knows it's already been seen. Cc: stable@vger.kernel.org # v5.8 Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com Fixes: 18bceab101ad ("io_uring: allow POLL_ADD with double poll_wait() users") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: internally retry short readsJens Axboe2020-08-131-39/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've had a few application cases of not handling short reads properly, and it is understandable as short reads aren't really expected if the application isn't doing non-blocking IO. Now that we retain the iov_iter over retries, we can implement internal retry pretty trivially. This ensures that we don't return a short read, even for buffered reads on page cache conflicts. Cleanup the deep nesting and hard to read nature of io_read() as well, it's much more straight forward now to read and understand. Added a few comments explaining the logic as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: retain iov_iter state over io_read/io_write callsJens Axboe2020-08-131-66/+70
| | | | | | | | | | | | | | | | | | | | | | | | Instead of maintaining (and setting/remembering) iov_iter size and segment counts, just put the iov_iter in the async part of the IO structure. This is mostly a preparation patch for doing appropriate internal retries for short reads, but it also cleans up the state handling nicely and simplifies it quite a bit. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * task_work: only grab task signal lock when neededJens Axboe2020-08-132-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If JOBCTL_TASK_WORK is already set on the targeted task, then we need not go through {lock,unlock}_task_sighand() to set it again and queue a signal wakeup. This is safe as we're checking it _after_ adding the new task_work with cmpxchg(). The ordering is as follows: task_work_add() get_signal() -------------------------------------------------------------- STORE(task->task_works, new_work); STORE(task->jobctl); mb(); mb(); LOAD(task->jobctl); LOAD(task->task_works); This speeds up TWA_SIGNAL handling quite a bit, which is important now that io_uring is relying on it for all task_work deliveries. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jann Horn <jannh@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: enable lookup of links holding inflight filesJens Axboe2020-08-121-10/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When a process exits, we cancel whatever requests it has pending that are referencing the file table. However, if a link is holding a reference, then we cannot find it by simply looking at the inflight list. Enable checking of the poll and timeout list to find the link, and cancel it appropriately. Cc: stable@vger.kernel.org Reported-by: Josef <josef.grieb@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: fail poll arm on queue proc failureJens Axboe2020-08-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check the ipt.error value, it must have been either cleared to zero or set to another error than the default -EINVAL if we don't go through the waitqueue proc addition. Just give up on poll at that point and return failure, this will fallback to async work. io_poll_add() doesn't suffer from this failure case, as it returns the error value directly. Cc: stable@vger.kernel.org # v5.7+ Reported-by: syzbot+a730016dc0bdce4f6ff5@syzkaller.appspotmail.com Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: hold 'ctx' reference around task_work queue + executeJens Axboe2020-08-111-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | We're holding the request reference, but we need to go one higher to ensure that the ctx remains valid after the request has finished. If the ring is closed with pending task_work inflight, and the given io_kiocb finishes sync during issue, then we need a reference to the ring itself around the task_work execution cycle. Cc: stable@vger.kernel.org # v5.7+ Reported-by: syzbot+9b260fc33297966f5a8e@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * fs: RWF_NOWAIT should imply IOCB_NOIOJens Axboe2020-08-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | With the change allowing read-ahead for IOCB_NOWAIT, we changed the RWF_NOWAIT semantics of only doing cached reads. Since we know have IOCB_NOIO to manage that specific side of it, just make RWF_NOWAIT imply IOCB_NOIO as well to restore the previous behavior. Fixes: 2e85abf053b9 ("mm: allow read-ahead with IOCB_NOWAIT set") Reported-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: defer file table grabbing request cleanup for locked requestsJens Axboe2020-08-101-10/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If we're in the error path failing links and we have a link that has grabbed a reference to the fs_struct, then we cannot safely drop our reference to the table if we already hold the completion lock. This adds a hardirq dependency to the fs_struct->lock, which it currently doesn't have. Defer the final cleanup and free of such requests to avoid adding this dependency. Reported-by: syzbot+ef4b654b49ed7ff049bf@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: add missing REQ_F_COMP_LOCKED for nested requestsJens Axboe2020-08-101-0/+3
| | | | | | | | | | | | | | | | When we traverse into failing links or timeouts, we need to ensure we propagate the REQ_F_COMP_LOCKED flag to ensure that we correctly signal to the completion side that we already hold the completion lock. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: fix recursive completion locking on oveflow flushJens Axboe2020-08-101-10/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syszbot reports a scenario where we recurse on the completion lock when flushing an overflow: 1 lock held by syz-executor287/6816: #0: ffff888093cdb4d8 (&ctx->completion_lock){....}-{2:2}, at: io_cqring_overflow_flush+0xc6/0xab0 fs/io_uring.c:1333 stack backtrace: CPU: 1 PID: 6816 Comm: syz-executor287 Not tainted 5.8.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1f0/0x31e lib/dump_stack.c:118 print_deadlock_bug kernel/locking/lockdep.c:2391 [inline] check_deadlock kernel/locking/lockdep.c:2432 [inline] validate_chain+0x69a4/0x88a0 kernel/locking/lockdep.c:3202 __lock_acquire+0x1161/0x2ab0 kernel/locking/lockdep.c:4426 lock_acquire+0x160/0x730 kernel/locking/lockdep.c:5005 __raw_spin_lock_irq include/linux/spinlock_api_smp.h:128 [inline] _raw_spin_lock_irq+0x67/0x80 kernel/locking/spinlock.c:167 spin_lock_irq include/linux/spinlock.h:379 [inline] io_queue_linked_timeout fs/io_uring.c:5928 [inline] __io_queue_async_work fs/io_uring.c:1192 [inline] __io_queue_deferred+0x36a/0x790 fs/io_uring.c:1237 io_cqring_overflow_flush+0x774/0xab0 fs/io_uring.c:1359 io_ring_ctx_wait_and_kill+0x2a1/0x570 fs/io_uring.c:7808 io_uring_release+0x59/0x70 fs/io_uring.c:7829 __fput+0x34f/0x7b0 fs/file_table.c:281 task_work_run+0x137/0x1c0 kernel/task_work.c:135 exit_task_work include/linux/task_work.h:25 [inline] do_exit+0x5f3/0x1f20 kernel/exit.c:806 do_group_exit+0x161/0x2d0 kernel/exit.c:903 __do_sys_exit_group+0x13/0x20 kernel/exit.c:914 __se_sys_exit_group+0x10/0x10 kernel/exit.c:912 __x64_sys_exit_group+0x37/0x40 kernel/exit.c:912 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by passing back the link from __io_queue_async_work(), and then let the caller handle the queueing of the link. Take care to also punt the submission reference put to the caller, as we're holding the completion lock for the __io_queue_defer() case. Hence we need to mark the io_kiocb appropriately for that case. Reported-by: syzbot+996f91b6ec3812c48042@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: use TWA_SIGNAL for task_work uncondtionallyJens Axboe2020-08-101-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An earlier commit: b7db41c9e03b ("io_uring: fix regression with always ignoring signals in io_cqring_wait()") ensured that we didn't get stuck waiting for eventfd reads when it's registered with the io_uring ring for event notification, but we still have cases where the task can be waiting on other events in the kernel and need a bigger nudge to make forward progress. Or the task could be in the kernel and running, but on its way to blocking. This means that TWA_RESUME cannot reliably be used to ensure we make progress. Use TWA_SIGNAL unconditionally. Cc: stable@vger.kernel.org # v5.7+ Reported-by: Josef <josef.grieb@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: account locked memory before potential error caseJens Axboe2020-08-061-8/+10
| | | | | | | | | | | | | | | | | | The tear down path will always unaccount the memory, so ensure that we have accounted it before hitting any of them. Reported-by: Tomáš Chaloupka <chalucha@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: set ctx sq/cq entry count earlierJens Axboe2020-08-061-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we hit an earlier error path in io_uring_create(), then we will have accounted memory, but not set ctx->{sq,cq}_entries yet. Then when the ring is torn down in error, we use those values to unaccount the memory. Ensure we set the ctx entries before we're able to hit a potential error path. Cc: stable@vger.kernel.org Reported-by: Tomáš Chaloupka <chalucha@gmail.com> Tested-by: Tomáš Chaloupka <chalucha@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * io_uring: Fix NULL pointer dereference in loop_rw_iter()Guoyu Huang2020-08-051-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | loop_rw_iter() does not check whether the file has a read or write function. This can lead to NULL pointer dereference when the user passes in a file descriptor that does not have read or write function. The crash log looks like this: [ 99.834071] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 99.835364] #PF: supervisor instruction fetch in kernel mode [ 99.836522] #PF: error_code(0x0010) - not-present page [ 99.837771] PGD 8000000079d62067 P4D 8000000079d62067 PUD 79d8c067 PMD 0 [ 99.839649] Oops: 0010 [#2] SMP PTI [ 99.840591] CPU: 1 PID: 333 Comm: io_wqe_worker-0 Tainted: G D 5.8.0 #2 [ 99.842622] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 99.845140] RIP: 0010:0x0 [ 99.845840] Code: Bad RIP value. [ 99.846672] RSP: 0018:ffffa1c7c01ebc08 EFLAGS: 00010202 [ 99.848018] RAX: 0000000000000000 RBX: ffff92363bd67300 RCX: ffff92363d461208 [ 99.849854] RDX: 0000000000000010 RSI: 00007ffdbf696bb0 RDI: ffff92363bd67300 [ 99.851743] RBP: ffffa1c7c01ebc40 R08: 0000000000000000 R09: 0000000000000000 [ 99.853394] R10: ffffffff9ec692a0 R11: 0000000000000000 R12: 0000000000000010 [ 99.855148] R13: 0000000000000000 R14: ffff92363d461208 R15: ffffa1c7c01ebc68 [ 99.856914] FS: 0000000000000000(0000) GS:ffff92363dd00000(0000) knlGS:0000000000000000 [ 99.858651] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 99.860032] CR2: ffffffffffffffd6 CR3: 000000007ac66000 CR4: 00000000000006e0 [ 99.861979] Call Trace: [ 99.862617] loop_rw_iter.part.0+0xad/0x110 [ 99.863838] io_write+0x2ae/0x380 [ 99.864644] ? kvm_sched_clock_read+0x11/0x20 [ 99.865595] ? sched_clock+0x9/0x10 [ 99.866453] ? sched_clock_cpu+0x11/0xb0 [ 99.867326] ? newidle_balance+0x1d4/0x3c0 [ 99.868283] io_issue_sqe+0xd8f/0x1340 [ 99.869216] ? __switch_to+0x7f/0x450 [ 99.870280] ? __switch_to_asm+0x42/0x70 [ 99.871254] ? __switch_to_asm+0x36/0x70 [ 99.872133] ? lock_timer_base+0x72/0xa0 [ 99.873155] ? switch_mm_irqs_off+0x1bf/0x420 [ 99.874152] io_wq_submit_work+0x64/0x180 [ 99.875192] ? kthread_use_mm+0x71/0x100 [ 99.876132] io_worker_handle_work+0x267/0x440 [ 99.877233] io_wqe_worker+0x297/0x350 [ 99.878145] kthread+0x112/0x150 [ 99.878849] ? __io_worker_unuse+0x100/0x100 [ 99.879935] ? kthread_park+0x90/0x90 [ 99.880874] ret_from_fork+0x22/0x30 [ 99.881679] Modules linked in: [ 99.882493] CR2: 0000000000000000 [ 99.883324] ---[ end trace 4453745f4673190b ]--- [ 99.884289] RIP: 0010:0x0 [ 99.884837] Code: Bad RIP value. [ 99.885492] RSP: 0018:ffffa1c7c01ebc08 EFLAGS: 00010202 [ 99.886851] RAX: 0000000000000000 RBX: ffff92363acd7f00 RCX: ffff92363d461608 [ 99.888561] RDX: 0000000000000010 RSI: 00007ffe040d9e10 RDI: ffff92363acd7f00 [ 99.890203] RBP: ffffa1c7c01ebc40 R08: 0000000000000000 R09: 0000000000000000 [ 99.891907] R10: ffffffff9ec692a0 R11: 0000000000000000 R12: 0000000000000010 [ 99.894106] R13: 0000000000000000 R14: ffff92363d461608 R15: ffffa1c7c01ebc68 [ 99.896079] FS: 0000000000000000(0000) GS:ffff92363dd00000(0000) knlGS:0000000000000000 [ 99.898017] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 99.899197] CR2: ffffffffffffffd6 CR3: 000000007ac66000 CR4: 00000000000006e0 Fixes: 32960613b7c3 ("io_uring: correctly handle non ->{read,write}_iter() file_operations") Cc: stable@vger.kernel.org Signed-off-by: Guoyu Huang <hgy5945@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>