summaryrefslogtreecommitdiffstats
path: root/drivers/nvme/host
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | nvme: Don't suspend admin queue that wasn't createdGabriel Krisman Bertazi2016-09-071-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression in my previous commit c21377f8366c ("nvme: Suspend all queues before deletion"), which provoked an Oops in the removal path when removing a device that became IO incapable very early at probe (i.e. after a failed EEH recovery). Turns out, if the error occurred very early at the probe path, before even configuring the admin queue, we might try to suspend the uninitialized admin queue, accessing bad memory. Fixes: c21377f8366c ("nvme: Suspend all queues before deletion") Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | | | nvme: make NVME_RDMA depend on BLOCKLinus Torvalds2016-09-111-1/+1
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci driver") removed the dependency on BLK_DEV_NVME, but the cdoe does depend on the block layer (which used to be an implicit dependency through BLK_DEV_NVME). Otherwise you get various errors from the kbuild test robot random config testing when that happens to hit a configuration with BLOCK device support disabled. Cc: Christoph Hellwig <hch@lst.de> Cc: Jay Freyensee <james_p_freyensee@linux.intel.com> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | / / Merge branch 'nvmf-4.8-rc' of git://git.infradead.org/nvme-fabrics into ↵Jens Axboe2016-08-294-27/+46
|\| | | | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for-linus Sagi writes: Mostly stability fixes and cleanups: - NQN endianess fix from Daniel - possible use-after-free fix from Vincent - nvme-rdma connect semantics fixes from Jay - Remove redundant variables in rdma driver - Kbuild fix from Christoph - nvmf_host referencing fix from Christoph - uninit variable fix from Colin
| * | nvme-rdma: Get rid of redundant definesSagi Grimberg2016-08-281-4/+0
| | | | | | | | | | | | | | | Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * | nvme-rdma: Get rid of duplicate variableSagi Grimberg2016-08-281-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | We already have need_inval in ib_mr, lets use that instead. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * | nvme: fabrics drivers don't need the nvme-pci driverChristoph Hellwig2016-08-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | So select the NVME_CORE symbol instead of depending on BLK_DEV_NVME. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| * | nvme-fabrics: get a reference when reusing a nvme_host structureChristoph Hellwig2016-08-191-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Without this we'll get a use after free after connecting two controller using the same hostnqn and then disconnecting one of them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| * | nvme-fabrics: change NQN UUID to big-endian formatDaniel Verkamp2016-08-192-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NVM Express 1.2.1 section 7.9, NVMe Qualified Names, specifies that the UUID format of NQN uses a UUID based on RFC 4122. RFC 4122 specifies that the UUID is encoded in big-endian byte order. Switch the NVMe over Fabrics host ID field from little-endian UUID to big-endian UUID to match the specification. Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| * | nvme-rdma: fix sqsize/hsqsize per specJay Freyensee2016-08-181-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as a 0-based value. Also per spec, the RDMA binding values shall be set to sqsize, which makes hsqsize 0-based values. Thus, the sqsize during NVMf connect() is now: [root@fedora23-fabrics-host1 for-48]# dmesg [ 318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for admin queue: 31 [ 318.720884] nvme nvme0: creating 16 I/O queues. [ 318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for i/o queue: 127 Finally, current interpretation implies hrqsize is 1's based so set it appropriately. Reported-by: Daniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| * | fabrics: define admin sqsize min default, per specJay Freyensee2016-08-182-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upon admin queue connect(), the rdma qp was being set based on NVMF_AQ_DEPTH. However, the fabrics layer was using the sqsize field value set for I/O queues for the admin queue, which threw the nvme layer and rdma layer off-whack: root@fedora23-fabrics-host1 nvmf]# dmesg [ 3507.798642] nvme_fabrics: nvmf_connect_admin_queue():admin sqsize being sent is: 128 [ 3507.798858] nvme nvme0: creating 16 I/O queues. [ 3507.896407] nvme nvme0: new ctrl: NQN "nullside-nqn", addr 192.168.1.3:4420 Thus, to have a different admin queue value, we use NVMF_AQ_DEPTH for connect() and RDMA private data as the minimum depth specified in the NVMe-over-Fabrics 1.0 spec (and in that RDMA private data we treat hrqsize as 1's-based value, per current understanding of the fabrics spec). Reported-by: Daniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
| * | nvme-rdma: initialize ret to zero to avoid returning garbageColin Ian King2016-08-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | ret is not initialized so it contains garbage. Ensure garbage is not returned by initializing rc to 0. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* | | nvme: Fix nvme_get/set_features() with a NULL result pointerAndy Lutomirski2016-08-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nvme_set_features() callers seem to expect that passing NULL as the result pointer is acceptable. Teach nvme_set_features() not to try to write to the NULL address. For symmetry, make the same change to nvme_get_features(), despite the fact that all current callers pass a valid result pointer. I assume that this bug hasn't been reported in practice because the callers that pass NULL are all in the SCSI translation layer and no one uses the relevant operations. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | nvme: Prevent controller state invalid transitionGabriel Krisman Bertazi2016-08-151-2/+5
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Acquiring the nvme_ctrl lock before reading ctrl->state in nvme_change_ctrl_state() should prevent a theoretical invalid state transition, in the event of two threads racing inside that function. I haven't been able to observe this happening with the current code, and the current state machine seems to be simple enough to not be affected by these invalid transitions, but future modifications could make it more likely to happen. Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Sagi Grimberg <sag@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nvme: Suspend all queues before deletionGabriel Krisman Bertazi2016-08-111-12/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When nvme_delete_queue fails in the first pass of the nvme_disable_io_queues() loop, we return early, failing to suspend all of the IO queues. Later, on the nvme_pci_disable path, this causes us to disable MSI without actually having freed all the IRQs, which triggers the BUG_ON in free_msi_irqs(), as show below. This patch refactors nvme_disable_io_queues to suspend all queues before start submitting delete queue commands. This way, we ensure that we have at least returned every IRQ before continuing with the removal path. [ 487.529200] kernel BUG at ../drivers/pci/msi.c:368! cpu 0x46: Vector: 700 (Program Check) at [c0000078c5b83650] pc: c000000000627a50: free_msi_irqs+0x90/0x200 lr: c000000000627a40: free_msi_irqs+0x80/0x200 sp: c0000078c5b838d0 msr: 9000000100029033 current = 0xc0000078c5b40000 paca = 0xc000000002bd7600 softe: 0 irq_happened: 0x01 pid = 1376, comm = kworker/70:1H kernel BUG at ../drivers/pci/msi.c:368! Linux version 4.7.0.mainline+ (root@iod76) (gcc version 5.3.1 20160413 (Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #104 SMP Fri Jul 29 09:20:17 CDT 2016 enter ? for help [c0000078c5b83920] d0000000363b0cd8 nvme_dev_disable+0x208/0x4f0 [nvme] [c0000078c5b83a10] d0000000363b12a4 nvme_timeout+0xe4/0x250 [nvme] [c0000078c5b83ad0] c0000000005690e4 blk_mq_rq_timed_out+0x64/0x110 [c0000078c5b83b40] c00000000056c930 bt_for_each+0x160/0x170 [c0000078c5b83bb0] c00000000056d928 blk_mq_queue_tag_busy_iter+0x78/0x110 [c0000078c5b83c00] c0000000005675d8 blk_mq_timeout_work+0xd8/0x1b0 [c0000078c5b83c50] c0000000000e8cf0 process_one_work+0x1e0/0x590 [c0000078c5b83ce0] c0000000000e9148 worker_thread+0xa8/0x660 [c0000078c5b83d80] c0000000000f2090 kthread+0x110/0x130 [c0000078c5b83e30] c0000000000095f0 ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: linux-nvme@lists.infradead.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | Merge branch 'nvmf-4.8-rc' of git://git.infradead.org/nvme-fabrics into ↵Jens Axboe2016-08-081-38/+45
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for-linus Sagi writes: Mostly stability fixes for nvmet, rdma: - fix uninitialized rdma_cm private data from Roland. - rdma device removal handling (host and target). - fix controller disconnect during active mounts. - fix namespaces lost after fabric reconnects. - remove redundant calls to namespace removal (rdma, loop). - actually send controller shutdown when disconnecting. - reconnect fixes (ns rescan and aen requeue) - nvmet controller serial number inconsistency fix.
| * nvme-rdma: Remove unused includesSagi Grimberg2016-08-041-3/+0
| | | | | | | | | | Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
| * nvme-rdma: start async event handler after reconnecting to a controllerSagi Grimberg2016-08-041-0/+2
| | | | | | | | | | | | | | | | | | When we reset or reconnect to a controller, we are cancelling the async event handler so we can safely re-establish resources, but we need to remember to start it again when we successfully reconnect. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * nvme-rdma: Make sure to shutdown the controller if we canSagi Grimberg2016-08-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Relying on ctrl state in nvme_rdma_shutdown_ctrl is wrong because it will never be NVME_CTRL_LIVE (delete_ctrl or reset_ctrl invoked it). Instead, check that the admin queue is connected. Note that it is safe because we can never see a copmeting thread trying to destroy the admin queue (reset or delete controller). Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * nvme-rdma: Free the I/O tags when we delete the controllerSagi Grimberg2016-08-031-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we wait until we free the controller (free_ctrl) we might lose our rdma device without any notification while we still have open resources (tags mrs and dma mappings). Instead, destroy the tags with their rdma resources once we delete the device and not when freeing it. Note that we don't do that in nvme_rdma_shutdown_ctrl because controller reset uses it as well and we want to give active I/O a chance to complete successfully. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * nvme-rdma: Remove duplicate call to nvme_remove_namespacesSagi Grimberg2016-08-031-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nvme_uninit_ctrl already does that for us. Note that we reordered nvme_rdma_shutdown_ctrl and nvme_uninit_ctrl, this is perfectly fine because we actually want ctrl uninit (aen, scan cancellation and namespaces removal) to happen before we shutdown the rdma resources. Also, centralize the deletion work and the dead controller removal work code duplication into __nvme_rdma_shutdown_ctrl that accepts a shutdown boolean. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * nvme-rdma: Fix device removal handlingSagi Grimberg2016-08-031-20/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Device removal sequence may have crashed because the controller (and admin queue space) was freed before we destroyed the admin queue resources. Thus we want to destroy the admin queue and only then queue controller deletion and wait for it to complete. More specifically we: 1. own the controller deletion (make sure we are not competing with another deletion). 2. get rid of inflight reconnects if exists (which also destroy and create queues). 3. destroy the queue. 4. safely queue controller deletion (and wait for it to complete). Reported-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * nvme-rdma: Queue ns scanning after a sucessful reconnectionSagi Grimberg2016-08-031-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On an ordered target shutdown, the target can send a AEN on a namespace removal, this will trigger the host to queue ns-list query. The shutdown will trigger error recovery which will attepmt periodic reconnect. We can hit a race where the ns rescanning fails (error recovery kicked in and we're not connected) causing removing all the namespaces and when we reconnect we won't see any namespaces for this controller. So, queue a namespace rescan after we successfully reconnected to the target. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
| * nvme-rdma: Don't leak uninitialized memory in connect request private dataRoland Dreier2016-08-031-1/+1
| | | | | | | | | | | | | | | | | | | | Zero out the full nvme_rdma_cm_req structure before sending it. Otherwise we end up leaking kernel memory in the reserved field, which might break forward compatibility in the future. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* | Merge tag 'pci-v4.8-changes' of ↵Linus Torvalds2016-08-021-12/+3
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Highlights: - ARM64 support for ACPI host bridges - new drivers for Axis ARTPEC-6 and Marvell Aardvark - new pci_alloc_irq_vectors() interface for MSI-X, MSI, legacy INTx - pci_resource_to_user() cleanup (more to come) Detailed summary: Enumeration: - Move ecam.h to linux/include/pci-ecam.h (Jayachandran C) - Add parent device field to ECAM struct pci_config_window (Jayachandran C) - Add generic MCFG table handling (Tomasz Nowicki) - Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC (Tomasz Nowicki) - Factor DT-specific pci_bus_find_domain_nr() code out (Tomasz Nowicki) Resource management: - Add devm_request_pci_bus_resources() (Bjorn Helgaas) - Unify pci_resource_to_user() declarations (Bjorn Helgaas) - Implement pci_resource_to_user() with pcibios_resource_to_bus() (microblaze, powerpc, sparc) (Bjorn Helgaas) - Request host bridge window resources (designware, iproc, rcar, xgene, xilinx, xilinx-nwl) (Bjorn Helgaas) - Make PCI I/O space optional on ARM32 (Bjorn Helgaas) - Ignore write combining when mapping I/O port space (Bjorn Helgaas) - Claim bus resources on MIPS PCI_PROBE_ONLY set-ups (Bjorn Helgaas) - Remove unicore32 pci=firmware command line parameter handling (Bjorn Helgaas) - Support I/O resources when parsing host bridge resources (Jayachandran C) - Add helpers to request/release memory and I/O regions (Johannes Thumshirn) - Use pci_(request|release)_mem_regions (NVMe, lpfc, GenWQE, ethernet/intel, alx) (Johannes Thumshirn) - Extend pci=resource_alignment to specify device/vendor IDs (Koehrer Mathias (ETAS/ESW5)) - Add generic pci_bus_claim_resources() (Lorenzo Pieralisi) - Claim bus resources on ARM32 PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi) - Remove ARM32 and ARM64 arch-specific pcibios_enable_device() (Lorenzo Pieralisi) - Add pci_unmap_iospace() to unmap I/O resources (Sinan Kaya) - Remove powerpc __pci_mmap_set_pgprot() (Yinghai Lu) PCI device hotplug: - Allow additional bus numbers for hotplug bridges (Keith Busch) - Ignore interrupts during D3cold (Lukas Wunner) Power management: - Enforce type casting for pci_power_t (Andy Shevchenko) - Don't clear d3cold_allowed for PCIe ports (Mika Westerberg) - Put PCIe ports into D3 during suspend (Mika Westerberg) - Power on bridges before scanning new devices (Mika Westerberg) - Runtime resume bridge before rescan (Mika Westerberg) - Add runtime PM support for PCIe ports (Mika Westerberg) - Remove redundant check of pcie_set_clkpm (Shawn Lin) Virtualization: - Add function 1 DMA alias quirk for Marvell 88SE9182 (Aaron Sierra) - Add DMA alias quirk for Adaptec 3805 (Alex Williamson) - Mark Atheros AR9485 and QCA9882 to avoid bus reset (Chris Blake) - Add ACS quirk for Solarflare SFC9220 (Edward Cree) MSI: - Fix PCI_MSI dependencies (Arnd Bergmann) - Add pci_msix_desc_addr() helper (Christoph Hellwig) - Switch msix_program_entries() to use pci_msix_desc_addr() (Christoph Hellwig) - Make the "entries" argument to pci_enable_msix() optional (Christoph Hellwig) - Provide sensible IRQ vector alloc/free routines (Christoph Hellwig) - Spread interrupt vectors in pci_alloc_irq_vectors() (Christoph Hellwig) Error Handling: - Bind DPC to Root Ports as well as Downstream Ports (Keith Busch) - Remove DPC tristate module option (Keith Busch) - Convert Downstream Port Containment driver to use devm_* functions (Mika Westerberg) Generic host bridge driver: - Select IRQ_DOMAIN (Arnd Bergmann) - Claim bus resources on PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi) ACPI host bridge driver: - Add ARM64 acpi_pci_bus_find_domain_nr() (Tomasz Nowicki) - Add ARM64 ACPI support for legacy IRQs parsing and consolidation with DT code (Tomasz Nowicki) - Implement ARM64 AML accessors for PCI_Config region (Tomasz Nowicki) - Support ARM64 ACPI-based PCI host controller (Tomasz Nowicki) Altera host bridge driver: - Check link status before retrain link (Ley Foon Tan) - Poll for link up status after retraining the link (Ley Foon Tan) Axis ARTPEC-6 host bridge driver: - Add PCI_MSI_IRQ_DOMAIN dependency (Arnd Bergmann) - Add DT binding for Axis ARTPEC-6 PCIe controller (Niklas Cassel) - Add Axis ARTPEC-6 PCIe controller driver (Niklas Cassel) Intel VMD host bridge driver: - Use lock save/restore in interrupt enable path (Jon Derrick) - Select device dma ops to override (Keith Busch) - Initialize list item in IRQ disable (Keith Busch) - Use x86_vector_domain as parent domain (Keith Busch) - Separate MSI and MSI-X vector sharing (Keith Busch) Marvell Aardvark host bridge driver: - Add DT binding for the Aardvark PCIe controller (Thomas Petazzoni) - Add Aardvark PCI host controller driver (Thomas Petazzoni) - Add Aardvark PCIe support for Armada 3700 (Thomas Petazzoni) Microsoft Hyper-V host bridge driver: - Fix interrupt cleanup path (Cathy Avery) - Don't leak buffer in hv_pci_onchannelcallback() (Vitaly Kuznetsov) - Handle all pending messages in hv_pci_onchannelcallback() (Vitaly Kuznetsov) NVIDIA Tegra host bridge driver: - Program PADS_REFCLK_CFG* always, not just on legacy SoCs (Stephen Warren) - Program PADS_REFCLK_CFG* registers with per-SoC values (Stephen Warren) - Use lower-case hex consistently for register definitions (Thierry Reding) - Use generic pci_remap_iospace() rather than ARM32-specific one (Thierry Reding) - Stop setting pcibios_min_mem (Thierry Reding) Renesas R-Car host bridge driver: - Drop gen2 dummy I/O port region (Bjorn Helgaas) TI DRA7xx host bridge driver: - Fix return value in case of error (Christophe JAILLET) Xilinx AXI host bridge driver: - Fix return value in case of error (Christophe JAILLET) Miscellaneous: - Make bus_attr_resource_alignment static (Ben Dooks) - Include <asm/dma.h> for isa_dma_bridge_buggy (Ben Dooks) - MAINTAINERS: Add file patterns for PCI device tree bindings (Geert Uytterhoeven) - Make host bridge drivers explicitly non-modular (Paul Gortmaker)" * tag 'pci-v4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (125 commits) PCI: xgene: Make explicitly non-modular PCI: thunder-pem: Make explicitly non-modular PCI: thunder-ecam: Make explicitly non-modular PCI: tegra: Make explicitly non-modular PCI: rcar-gen2: Make explicitly non-modular PCI: rcar: Make explicitly non-modular PCI: mvebu: Make explicitly non-modular PCI: layerscape: Make explicitly non-modular PCI: keystone: Make explicitly non-modular PCI: hisi: Make explicitly non-modular PCI: generic: Make explicitly non-modular PCI: designware-plat: Make it explicitly non-modular PCI: artpec6: Make explicitly non-modular PCI: armada8k: Make explicitly non-modular PCI: artpec: Add PCI_MSI_IRQ_DOMAIN dependency PCI: Add ACS quirk for Solarflare SFC9220 arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700 PCI: aardvark: Add Aardvark PCI host controller driver dt-bindings: add DT binding for the Aardvark PCIe controller PCI: tegra: Program PADS_REFCLK_CFG* registers with per-SoC values ...
| * NVMe: Use pci_(request|release)_mem_regionsJohannes Thumshirn2016-06-211-7/+3
| | | | | | | | | | | | | | | | | | | | | | Now that we do have pci_request_mem_regions() and pci_release_mem_regions() at hand, use it in the NVMe driver. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> CC: Keith Busch <keith.busch@intel.com> CC: Jens Axboe <axboe@fb.com>
* | Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds2016-07-269-86/+3479
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver updates from Jens Axboe: "This branch also contains core changes. I've come to the conclusion that from 4.9 and forward, I'll be doing just a single branch. We often have dependencies between core and drivers, and it's hard to always split them up appropriately without pulling core into drivers when that happens. That said, this contains: - separate secure erase type for the core block layer, from Christoph. - set of discard fixes, from Christoph. - bio shrinking fixes from Christoph, as a followup up to the op/flags change in the core branch. - map and append request fixes from Christoph. - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty exciting! - nvme-loop fixes from Arnd. - removal of ->driverfs_dev from Dan, after providing a device_add_disk() helper. - bcache fixes from Bhaktipriya and Yijing. - cdrom subchannel read fix from Vchannaiah. - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier. - set of drbd updates and fixes from Fabian, Lars, and Philipp. - mg_disk error path fix from Bart. - user notification for failed device add for loop, from Minfei. - NVMe in general: + NVMe delay quirk from Guilherme. + SR-IOV support and command retry limits from Keith. + fix for memory-less NUMA node from Masayoshi. + use UINT_MAX for discard sectors, from Minfei. + cancel IO fixes from Ming. + don't allocate unused major, from Neil. + error code fixup from Dan. + use constants for PSDT/FUSE from James. + variable init fix from Jay. + fabrics fixes from Ming, Sagi, and Wei. + various fixes" * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits) nvme/pci: Provide SR-IOV support nvme: initialize variable before logical OR'ing it block: unexport various bio mapping helpers scsi/osd: open code blk_make_request target: stop using blk_make_request block: simplify and export blk_rq_append_bio block: ensure bios return from blk_get_request are properly initialized virtio_blk: use blk_rq_map_kern memstick: don't allow REQ_TYPE_BLOCK_PC requests block: shrink bio size again block: simplify and cleanup bvec pool handling block: get rid of bio_rw and READA block: don't ignore -EOPNOTSUPP blkdev_issue_write_same block: introduce BLKDEV_DISCARD_ZERO to fix zeroout NVMe: don't allocate unused nvme_major nvme: avoid crashes when node 0 is memoryless node. nvme: Limit command retries loop: Make user notify for adding loop device failed nvme-loop: fix nvme-loop Kconfig dependencies nvmet: fix return value check in nvmet_subsys_alloc() ...
| * | nvme/pci: Provide SR-IOV supportKeith Busch2016-07-201-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | This registers an sr-iov callback for nvme. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme: initialize variable before logical OR'ing itJay Freyensee2016-07-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is typically not good coding or secure coding practice to logical OR a variable without an initialization value first. Here on this line: integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE; BLK_INTEGRITY_DEVICE_CAPABLE is being OR'ed to a member variable never set to an initial value. This patch fixes that. Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Ming Lin <ming.l@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | block: ensure bios return from blk_get_request are properly initializedChristoph Hellwig2016-07-201-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blk_get_request is used for BLOCK_PC and similar passthrough requests. Currently we always need to call blk_rq_set_block_pc or an open coded version of it to allow appending bios using the request mapping helpers later on, which is a somewhat awkward API. Instead move the initialization part of blk_rq_set_block_pc into blk_get_request, so that we always have a safe to use request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | block: get rid of bio_rw and READAChristoph Hellwig2016-07-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These two are confusing leftover of the old world order, combining values of the REQ_OP_ and REQ_ namespaces. For callers that don't special case we mostly just replace bi_rw with bio_data_dir or op_is_write, except for the few cases where a switch over the REQ_OP_ values makes more sense. Any check for READA is replaced with an explicit check for REQ_RAHEAD. Also remove the READA alias for REQ_RAHEAD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | NVMe: don't allocate unused nvme_majorNeilBrown2016-07-141-15/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When alloc_disk(0) is used, the ->major number is ignored. All device numbers are allocated with a major of BLOCK_EXT_MAJOR. So remove all references to nvme_major. [akpm@linux-foundation.org: one unregister_blkdev() was missed] Link: http://lkml.kernel.org/r/20160602064318.4403.93301.stgit@noble Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme: avoid crashes when node 0 is memoryless node.Masayoshi Mizuma2016-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_NUMA is enabled and node 0 is memoryless, the system crashes because nvme_probe() sets the device->numa_node to 0 by set_dev_node(&pdev->dev, 0), so it tries to allocate memory from node 0. To avoid the crash, we should change the 0 to first_memory_node. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme: Limit command retriesKeith Busch2016-07-123-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many controller implementations will return errors to commands that will not succeed, but without the DNR bit set. The driver previously retried these commands an unlimited number of times until the command timeout has exceeded, which takes an unnecessarilly long period of time. This patch limits the number of retries a command can have, defaulting to 5, but is user tunable at load or runtime. The struct request's 'retries' field is used to track the number of retries attempted. This is in contrast with scsi's use of this field, which indicates how many retries are allowed. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme-fabrics: add-remove ctrl repeat fixMing Lin2016-07-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Repeatedly adding then removing the same NVMe-over-Fabrics controller over and over again (shown below) can cause a kernel crash (also shown below). This patch fixes that. [nvmf]# ./setup_nvme_connections.sh traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=lightside -nqn,hostnqn=good-wins-nqn > /dev/nvme-fabrics [nvmf]# ./remove_nvme_connections.sh 2 echo 1 > /sys/class/nvme/nvme0/delete_controller echo 1 > /sys/class/nvme/nvme1/delete_controller [nvmf]# ./setup_nvme_connections.sh traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics Killed [nvmf]# dmesg [ 313.416908] nvme nvme0: creating 16 I/O queues. [ 313.523908] nvme nvme0: new ctrl: NQN "darkside-nqn", addr 192.168.1.100:4420 [ 313.524857] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [ 313.525262] IP: [<ffffffff8136c60e>] strcmp+0xe/0x30 [ 313.525490] PGD 0 [ 313.525726] Oops: 0000 [#1] SMP [ 313.525900] Modules linked in: nvme_rdma nvme_fabrics nvme_core ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_en mlx4_ib ib_core mlx4_core [ 313.527085] CPU: 15 PID: 5856 Comm: setup_nvme_conn Not tainted 4.7.0-rc2+ #2 [ 313.527259] Hardware name: Supermicro X9DRT-F/IBQF/IBFF/X9DRT -F/IBQF/IBFF, BIOS 1.0a 10/09/2012 [ 313.527551] task: ffff88027646cd40 ti: ffff88025b980000 task.ti: ffff88025b980000 [ 313.527879] RIP: 0010:[<ffffffff8136c60e>] [<ffffffff8136c60e>] strcmp+0xe/0x30 [ 313.528232] RSP: 0018:ffff88025b983db0 EFLAGS: 00010206 [ 313.528403] RAX: 0000000000000000 RBX: ffff880471879880 RCX: fffffffffffffff1 [ 313.528594] RDX: 0000000000000000 RSI: ffff880474afa860 RDI: 0000000000000011 [ 313.528778] RBP: ffff88025b983db0 R08: ffff880474afa860 R09: ffff880471879058 [ 313.528956] R10: 000000000000002c R11: ffff88047f415000 R12: ffff880471879800 [ 313.529129] R13: ffff880471879000 R14: ffff880474afa860 R15: fffffffffffffff8 [ 313.529303] FS: 00007f778f510700(0000) GS:ffff88047fbc0000(0000) knlGS:0000000000000000 [ 313.529629] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 313.529817] CR2: 0000000000000010 CR3: 0000000274174000 CR4: 00000000000406e0 [ 313.529989] Stack: [ 313.530154] ffff88025b983e48 ffffffffa0171c74 0000000000000001 0000000000000059 [ 313.530621] ffff880476f32400 ffff88047e8add80 0000010074b33aa0 ffff880471879059 [ 313.531162] ffff88047187904b ffff880471879058 0000000000000000 ffff88047736e000 [ 313.531629] Call Trace: [ 313.531797] [<ffffffffa0171c74>] nvmf_dev_write+0x674/0x840 [nvme_fabrics] [ 313.531974] [<ffffffff81180b53>] __vfs_write+0x23/0x120 [ 313.532146] [<ffffffff8119daff>] ? __fd_install+0x1f/0xc0 [ 313.532316] [<ffffffff8119d97a>] ? __alloc_fd+0x3a/0x170 [ 313.532487] [<ffffffff811811f3>] vfs_write+0xb3/0x1b0 [ 313.532658] [<ffffffff8117e321>] ? filp_close+0x51/0x70 [ 313.532845] [<ffffffff811824e1>] SyS_write+0x41/0xa0 [ 313.533016] [<ffffffff8183055b>] entry_SYSCALL_64_fastpath+0x13/0x8f [ 313.533188] Code: 80 3a 00 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2 01 84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48 83 c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 31 [ 313.536563] RIP [<ffffffff8136c60e>] strcmp+0xe/0x30 [ 313.536815] RSP <ffff88025b983db0> [ 313.536981] CR2: 0000000000000010 [ 313.537151] ---[ end trace 3d952e590e7bc2d5 ]--- Reported-and-tested-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <mlin@kernel.org> Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme-fabrics: Remove tl_retry_countSagi Grimberg2016-07-122-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The timeout before error recovery logic kicks in is dictated by the nvme keep-alive, so we don't really need a transport layer retry count. transports can retry for as much as they like. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme-rdma: Don't use tl_retry_countSagi Grimberg2016-07-121-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Always use the maximum qp retry count as the error recovery timeout is dictated from the nvme keep-alive. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme-rdma: fix the return value of nvme_rdma_reinit_request()Wei Yongjun2016-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | PTR_ERR should be applied before its argument is reassigned, otherwise the return value will be set to 0, not error code. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme/quirk: Add a delay before checking for adapter readinessGuilherme G. Piccoli2016-07-123-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When disabling the controller, the specification says the register NVME_REG_CC should be written and then driver needs to wait the adapter to be ready, which is checked by reading another register bit (NVME_CSTS_RDY). There's a timeout validation in this checking, so in case this timeout is reached the driver gives up and removes the adapter from the system. After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003) (HGST adapter) end up being removed if we issue a reset_controller, because driver keeps verifying the NVME_REG_CSTS until the timeout is reached. This patch adds a necessary quirk for this adapter, by introducing a delay before nvme_wait_ready(), so the reset procedure is able to be completed. This quirk is needed because just increasing the timeout is not enough in case of this adapter - the driver must wait before start reading NVME_REG_CSTS register on this specific device. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | Merge branch 'for-4.8/block' of ↵Jens Axboe2016-07-081-2/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm into for-4.8/drivers Dan writes: "The removal of ->driverfs_dev in favor of just passing the parent device in as a parameter to add_disk(). See below, it has received a "Reviewed-by" from Christoph, Bart, and Johannes. It is also a pre-requisite for Fam Zheng's work to cleanup gendisk uevents vs attribute visibility [1]. We would extend device_add_disk() to take an attribute_group list. This is based off a branch of block.git/for-4.8/drivers and has received a positive build success notification from the kbuild robot across several configs. [1]: "gendisk: Generate uevent after attribute available" http://marc.info/?l=linux-virtualization&m=146725201522201&w=2"
| | * | block: convert to device_add_disk()Dan Williams2016-06-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For block drivers that specify a parent device, convert them to use device_add_disk(). This conversion was done with the following semantic patch: @@ struct gendisk *disk; expression E; @@ - disk->driverfs_dev = E; ... - add_disk(disk); + device_add_disk(E, disk); @@ struct gendisk *disk; expression E1, E2; @@ - disk->driverfs_dev = E1; ... E2 = disk; ... - add_disk(E2); + device_add_disk(E1, E2); ...plus some manual fixups for a few missed conversions. Cc: Jens Axboe <axboe@fb.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * | | nvme-rdma: add a NVMe over Fabrics RDMA host driverChristoph Hellwig2016-07-083-0/+2040
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the RDMA host (initiator in SCSI speak) driver. It can be used to connect to remote NVMe over Fabrics controllers over Infiniband, RoCE or iWarp, and uses the existing NVMe core driver as well a the new fabrics library. To connect to all NVMe over Fabrics controller reachable on a given taget port using RDMA/CM use the following command: nvme connect-all -t rdma -a $IPADDR This requires the latest version of nvme-cli with Fabrics support. Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | nvme: add new reconnecting controller stateChristoph Hellwig2016-07-082-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nvme fabric (RDMA, FC, etc...) can introduce port, link or node failures that may require a reconnect to re-establish the connection. Add a new reconnecting state that will initially be used by the RDMA driver. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | nvme: lightnvm: make MLC num_pairs little endianJohannes Thumshirn2016-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the OpenChannel SSD interface specification the NAND flash MLC page pairing information's number of page page pairings field is the first two bytes in the MLC Page Pairing data structure. The hardware's data structure itself is little endian so annotate it as such, like the rest of lighnvm's data structures. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | nvme: add keep-alive supportSagi Grimberg2016-07-054-1/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Periodic keep-alive is a mandatory feature in NVMe over Fabrics, and optional in NVMe 1.2.1 for PCIe. This patch adds periodic keep-alive sent from the host to verify that the controller is still responsive and vice-versa. The keep-alive timeout is user-defined (with keep_alive_tmo connection parameter) and defaults to 5 seconds. In order to avoid a race condition where the host sends a keep-alive competing with the target side keep-alive timeout expiration, the host adds a grace period of 10 seconds when publishing the keep-alive timeout to the target. In case a keep-alive failed (or timed out), a transport specific error recovery kicks in. For now only NVMe over Fabrics is wired up to support keep alive, but we can add PCIe support easily once controllers actually supporting it become available. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@chelsio.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | nvme-fabrics: add a generic NVMe over Fabrics libraryChristoph Hellwig2016-07-056-1/+1098
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NVMe over Fabrics library provides an interface for both transports and the nvme core to handle fabrics specific commands and attributes independent of the underlying transport. In addition, the fabrics library adds a misc device interface that allow actually creating a fabrics controller, as we can't just autodiscover it like in the PCI case. The nvme-cli utility has been enhanced to use this interface to support fabric connect and discovery. Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>, Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>, Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | nvme.h: add NVMe over Fabrics definitionsChristoph Hellwig2016-07-052-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NVMe over Fabrics specification defines a protocol interface and related extensions to NVMe that enable operation over network protocols. The NVMe over Fabrics specification has an NVMe Transport binding for each NVMe Transport. This patch adds the fabrics related definitions: - fabric specific command set and error codes - transport addressing and binding definitions - fabrics sgl extensions - controller identification fabrics enhancements - discovery log page definition Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | nvme: add fabrics sysfs attributesMing Lin2016-07-053-3/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - delete_controller: This attribute allows to delete a controller. A driver is not obligated to support it (pci doesn't) so it is created only if the driver supports it. The new fabrics drivers will support it (essentialy a disconnect operation). Usage: echo > /sys/class/nvme/nvme0/delete_controller - subsysnqn: This attribute shows the subsystem nqn of the configured device. If a driver does not implement the get_subsysnqn method, the file will not appear in sysfs. - transport: This attribute shows the transport name. Added a "name" field to struct nvme_ctrl_ops. For loop, cat /sys/class/nvme/nvme0/transport loop For RDMA, cat /sys/class/nvme/nvme0/transport rdma For PCIe, cat /sys/class/nvme/nvme0/transport pcie - address: This attributes shows the controller address. The fabrics drivers that will implement get_address can show the address of the connected controller. example: cat /sys/class/nvme/nvme0/address traddr=192.168.2.2,trsvcid=1023 Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | nvme: Modify and export sync command submission for fabricsChristoph Hellwig2016-07-053-13/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NVMe over fabrics will use __nvme_submit_sync_cmd in the the transport and require a few tweaks to it. For that we export it and add a few more paramters: 1. allow passing a queue ID to the block layer For the NVMe over Fabrics connect command we need to able to specify a queue ID that we want to send the command on. Add a qid parameter to the relevant functions to enable this behavior. 2. allow submitting at_head commands In cases where we want to (re)connect to a controller where we have inflight queued commands we want to first connect and only then allow the other queued commands to be kicked. This will prevents failures in controller resets and reconnects. 3. allow passing flags to blk_mq_allocate_request Both for Fabrics connect the the keep-alive feature in NVMe 1.2.1 we want to be able to use reserved requests. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | nvme: allow transitioning from NEW to LIVE stateChristoph Hellwig2016-07-051-0/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For Fabrics we're not going through an intermediate reset state (at least for now). Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | nvme: move the workaround for I/O queue-less controllers from PCIe to coreChristoph Hellwig2016-06-122-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to apply this to Fabrics drivers as well, so move it to common code. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>