summaryrefslogtreecommitdiffstats
path: root/include/linux/nvme.h
Commit message (Collapse)AuthorAgeFilesLines
* NVMe: Fix hot cpu notification dead lockKeith Busch2014-06-131-1/+1
| | | | | | | | | | | | | | There is a potential dead lock if a cpu event occurs during nvme probe since it registered with hot cpu notification. This fixes the race by having the module register with notification outside of probe rather than have each device register. The actual work is done in a scheduled work queue instead of in the notifier since assigning IO queues has the potential to block if the driver creates additional queues. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Rename io_timeout to nvme_io_timeoutMatthew Wilcox2014-06-031-2/+2
| | | | | | | It's positively immoral to have a global variable called 'io_timeout'. Keep the module parameter called io_timeout, though. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Flush with data supportKeith Busch2014-05-051-1/+0
| | | | | | | | | | | | | | | | It is possible a filesystem may send a flush flagged bio with write data. There is no such composite NVMe command, so the driver sends flush and write separately. The device is allowed to execute these commands in any order, so it was possible the driver ends the bio after the write completes, but while the flush is still active. We don't want to let a filesystem believe flush succeeded before it really has; this could cause data corruption on a power loss between these events. To fix, this patch splits the flush and write into chained bios. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Configure support for block flushKeith Busch2014-05-051-0/+1
| | | | | | | | This configures an nvme request_queue as flush capable if the device has a volatile write cache present. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Update copyright headersMatthew Wilcox2014-05-051-5/+1
| | | | | | | Make the copyright dates accurate and remove the final paragraph that includes the address of the FSF. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* Merge git://git.infradead.org/users/willy/linux-nvmeLinus Torvalds2014-04-111-9/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe driver updates from Matthew Wilcox: "Various updates to the NVMe driver. The most user-visible change is that drive hotplugging now works and CPU hotplug while an NVMe drive is installed should also work better" * git://git.infradead.org/users/willy/linux-nvme: NVMe: Retry failed commands with non-fatal errors NVMe: Add getgeo to block ops NVMe: Start-stop nvme_thread during device add-remove. NVMe: Make I/O timeout a module parameter NVMe: CPU hot plug notification NVMe: per-cpu io queues NVMe: Replace DEFINE_PCI_DEVICE_TABLE NVMe: Fix divide-by-zero in nvme_trans_io_get_num_cmds NVMe: IOCTL path RCU protect queue access NVMe: RCU protected access to io queues NVMe: Initialize device reference count earlier NVMe: Add CONFIG_PM_SLEEP to suspend/resume functions
| * NVMe: Retry failed commands with non-fatal errorsKeith Busch2014-04-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For commands returned with failed status, queue these for resubmission and continue retrying them until success or for a limited amount of time. The final timeout was arbitrarily chosen so requests can't be retried indefinitely. Since these are requeued on the nvmeq that submitted the command, the callbacks have to take an nvmeq instead of an nvme_dev as a parameter so that we can use the locked queue to append the iod to retry later. The nvme_iod conviently can be used to track how long we've been trying to successfully complete an iod request. The nvme_iod also provides the nvme prp dma mappings, so I had to move a few things around so we can keep those mappings. Signed-off-by: Keith Busch <keith.busch@intel.com> [fixed checkpatch issue with long line] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * NVMe: Make I/O timeout a module parameterKeith Busch2014-04-101-1/+2
| | | | | | | | | | | | | | | | Increase the default timeout to 30 seconds to match SCSI. Signed-off-by: Keith Busch <keith.busch@intel.com> [use byte instead of ushort] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * NVMe: CPU hot plug notificationKeith Busch2014-04-101-0/+1
| | | | | | | | | | | | | | | | Registers with hot cpu notification to rebalance, and potentially allocate additional, io queues. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * NVMe: per-cpu io queuesKeith Busch2014-04-101-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | The device's IO queues are associated with CPUs, so we can use a per-cpu variable to map the a qid to a cpu. This provides a convienient way to optimally assign queues to multiple cpus when the device supports fewer queues than the host has cpus. The previous implementation may have assigned these poorly in these situations. This patch addresses this by sharing queues among cpus that are "close" together and should have a lower lock contention penalty. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * NVMe: IOCTL path RCU protect queue accessKeith Busch2014-03-241-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds rcu protected access to a queue in the nvme IOCTL path to fix potential races between a surprise removal and queue usage in nvme_submit_sync_cmd. The fix holds the rcu_read_lock() here to prevent the nvme_queue from freeing while this path is executing so it can't sleep, and so this path will no longer wait for a available command id should they all be in use at the time a passthrough IOCTL request is received. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * NVMe: RCU protected access to io queuesKeith Busch2014-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds rcu protected access to nvme_queue to fix a race between a surprise removal freeing the queue and a thread with open reference on a NVMe block device using that queue. The queues do not need to be rcu protected during the initialization or shutdown parts, so I've added a helper function for raw deferencing to get around the sparse errors. There is still a hole in the IOCTL path for the same problem, which is fixed in a subsequent patch. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* | nvme: don't use PREPARE_WORKTejun Heo2014-03-071-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | PREPARE_[DELAYED_]WORK() are being phased out. They have few users and a nasty surprise in terms of reentrancy guarantee as workqueue considers work items to be different if they don't have the same work function. nvme_dev->reset_work is multiplexed with multiple work functions. Introduce nvme_reset_workfn() which invokes nvme_dev->reset_workfn and always use it as the work function and update the users to set the ->reset_workfn field instead of overriding the work function using PREPARE_WORK(). It would probably be best to route this with other related updates through the workqueue tree. Compile tested. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: linux-nvme@lists.infradead.org
* NVMe: Abort timed out commandsKeith Busch2014-01-271-0/+1
| | | | | | | | | | Send nvme abort command to io requests that have timed out on an initialized device. If the command is not returned after another timeout, schedule the controller for reset. Signed-off-by: Keith Busch <keith.busch@intel.com> [fix endianness issues] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Schedule reset for failed controllersKeith Busch2014-01-271-0/+1
| | | | | | | | | | Schedules a controller reset when it indicates it has a failed status. If the device does not become ready after a reset, the pci device will be scheduled for removal. Signed-off-by: Keith Busch <keith.busch@intel.com> [fixed checkpatch issue] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Device resume error handlingKeith Busch2013-12-161-0/+1
| | | | | | | | | | | | | | | | | | Adds controller error handling on resume power management. If the device fails to initialize, the device is queued for a reset. If the reset fails, a thread is spawned to remove the pci device. If the device resumes as "busy", the device is responding to admin commands but will not create IO queues. In this case, we need to remove the gendisks and free the IO queues since they can't be used and may be holding bios in their lists. From testing, the dma pools require a pci device so this had to change the pci driver 'remove' to release the dma resources in line with that call instead of after all references to the device are released. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: compat SG_IO ioctlKeith Busch2013-12-161-0/+1
| | | | | | | | | | | For 32-bit versions of sg3-utils running on a 64-bit system. This is mostly a copy from the relevent portions of fs/compat_ioctl.c, with slight modifications for going through block_device_operations. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com> [fixed up CONFIG_COMPAT=n build problems] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Avoid shift operation when writing cq head doorbellHaiyan Hu2013-11-181-1/+1
| | | | | | | | | Changes the type of dev->db_stride to unsigned and changes the value stored there to be 1 << the current value. Then there is less calculation to be done at completion time. Signed-off-by: Haiyan Hu <huhaiyan@huawei.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Use normal shutdownKeith Busch2013-09-031-0/+2
| | | | | | | | | | | The NVMe spec recommends using the shutdown normal sequence when safely taking the controller offline instead of hitting CC.EN on the next start-up to reset the controller. The spec recommends a minimum of 1 second for the shutdown complete. This patch waits 2 seconds to be on the safe side. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Namespace IDs are unsignedMatthew Wilcox2013-09-031-1/+1
| | | | | | | | | | The 'Number of Namespaces' read from the device was being treated as signed, which would cause us to not scan any namespaces for a device with more than 2 billion namespaces. That led to noticing that the namespace ID was also being treated as signed, which could lead to the result from NVME_IOCTL_ID being treated as an error code. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Split header file into user-visible and kernel-visible piecesMatthew Wilcox2013-09-031-456/+5
| | | | | | | | To build user programs that call the NVMe ioctls, we need to have a user header file. Catch up to the new way of doing that by splitting the header file into kernel and uapi portions. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Disk IO statisticsKeith Busch2013-06-201-0/+1
| | | | | | | | Add io stats accounting for bio requests so nvme block devices show useful disk stats. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Simplify Firmware Activate code slightlyMatthew Wilcox2013-05-081-0/+3
| | | | | | | | | Add definitions for the three Firmware Activate actions, and change the SCSI translation code to construct the command into a temporary variable instead of translating the endianness back-and-forth. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com>
* NVMe: Meta-data support in NVME_IOCTL_SUBMIT_IOKeith Busch2013-05-021-0/+1
| | | | | | | | | | This adds support for namespaces with separate meta-data formats in the submit io ioctl. The meta-data buffer has to be a contiguous, so such a buffer is allocated and the mapped user pages are copied to/from this buffer for write/read commands. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Device specific stripe size handlingKeith Busch2013-05-021-0/+1
| | | | | | | | | | | We have an nvme device that has a concept of a stripe size. IO requests that do not transfer data crossing a stripe boundary has greater performance compared to IO that does cross it. This patch sets the stripe size for the device if the device and vendor ids match one with this feature and splits IO requests that cross the stripe boundary. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add a character device for each nvme deviceKeith Busch2013-04-161-0/+5
| | | | | | | | | | Registers a miscellaneous device for each nvme controller probed. This creates character device files as /dev/nvmeN, where N is the device instance, and supports nvme admin ioctl commands so devices without namespaces can be managed. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Fix endian-related problems in user I/O submission pathMatthew Wilcox2013-04-161-2/+2
| | | | | | | | | | | When constructing the command, dsmgmt needs to be treated as a 32-bit value, not a 16-bit value. reftag, apptag and appmask all need to be converted from native-endian to little-endian. Again, sparse's bitwise warnings caught this problem. Thanks to Keith for pointing out the correct way to fix the reftag. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Acked-by: Keith Busch <keith.busch@intel.com>
* NVMe: Abstract out sector to block number conversionMatthew Wilcox2013-04-161-0/+5
| | | | | | | | | Introduce nvme_block_nr() to help convert sectors to block numbers. This fixes an integer overflow in the SCSI conversion layer, and it's slightly less typing than opencoding it. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Acked-by: Keith Busch <keith.busch@intel.com>
* NVMe: Add nvme-scsi.cVishal Verma2013-03-281-0/+35
| | | | | | | | Translates SCSI commands in SG_IO ioctl to NVMe commands. Uses the scsi-nvme translation spec from nvmexpress.org as reference. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add definitions for format commandVishal Verma2013-03-271-0/+12
| | | | | | | | The SCSI emulation has the ability to send format commands, so we need to add the definition of the command. Also add a missing error code. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Move structures & definitions to header fileVishal Verma2013-03-271-0/+60
| | | | | | | | | nvme-scsi.c uses several data structures and definitions that were previously private to nvme-core.c. Move the definitions to nvme.h, protected by __KERNEL__. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add discard support for capable devicesKeith Busch2013-03-261-0/+32
| | | | | | | | | | This adds discard support to block queues if the nvme device is capable of deallocating blocks as indicated by the controller's optional command support. A discard flagged bio request will submit an NVMe deallocate Data Set Management command for the requested blocks. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Define SMART logKeith Busch2012-11-131-0/+28
| | | | | | | | This data structure is defined in the NVMe specification. It's not used by the kernel, but is available for use by userspace software. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Do not set IO queue depth beyond device maxKeith Busch2012-07-271-0/+1
| | | | | | | | Set the depth for IO queues to the device's maximum supported queue entries if the requested depth exceeds the device's capabilities. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Set block queue max sectorsKeith Busch2012-07-261-0/+1
| | | | | | | | Set the max hw sectors in a namespace's request queue if the nvme device has a max data transfer size. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Update Identify Controller data structureMatthew Wilcox2011-11-041-5/+22
| | | | | | | | | | The driver was still using an old definition of Identify Controller which only came to light once we started using the 'number of namespaces' field properly. Reported-by: Nisheeth Bhat <nisheeth.bhat@intel.com> Reported-by: Khosrow Panah <Khosrow.Panah@idt.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Implement doorbell stride capabilityMatthew Wilcox2011-11-041-0/+1
| | | | | | | | | The doorbell stride allows devices to spread out their doorbells instead of packing them tightly. This feature was added as part of ECN 003. This patch also enables support for more than 512 queues :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Rework ioctlsMatthew Wilcox2011-11-041-11/+23
| | | | | | | | | | | Remove the special-purpose IDENTIFY, GET_RANGE_TYPE, DOWNLOAD_FIRMWARE and ACTIVATE_FIRMWARE commands. Replace them with a generic ADMIN_CMD ioctl that can submit any admin command. Add a new ID ioctl that returns the namespace ID of the queried device. It corresponds to the SCSI Idlun ioctl. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Time out initialisation after a few secondsMatthew Wilcox2011-11-041-0/+2
| | | | | | | | | THe device reports (in its capability register) how long it will take to initialise. If that time elapses before the ready bit becomes set, conclude the device is broken and refuse to initialise it. Log a nice error message so the user knows why we did nothing. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Correct the Controller Configuration settingsMatthew Wilcox2011-11-041-4/+6
| | | | | | | | | The arbitration field was extended by one bit, shifting the shutdown notification bits by one. Also, the SQ/CQ entry size was made configurable for future extensions. Reported-by: Paul Luse <paul.e.luse@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Change the definition of nvme_user_ioMatthew Wilcox2011-11-041-5/+3
| | | | | | | | | | | | | | | | | | | | | | | The read and write commands don't define a 'result', so there's no need to copy it back to userspace. Remove the ability of the ioctl to submit commands to a different namespace; it's just asking for trouble, and the use case I have in mind will be addressed througha different ioctl in the future. That removes the need for both the block_shift and nsid arguments. Check that the opcode is one of 'read' or 'write'. Future opcodes may be added in the future, but we will need a different structure definition for them. The nblocks field is redefined to be 0-based. This allows the user to request the full 65536 blocks. Don't byteswap the reftag, apptag and appmask. Martin Petersen tells me these are calculated in big-endian and are transmitted to the device in big-endian. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Correct the definitions of two ioctlsMatthew Wilcox2011-11-041-2/+2
| | | | | | | | NVME_IOCTL_SUBMIT_IO has a struct nvme_user_io, not a struct nvme_rw_command as a parameter, and NVME_IOCTL_DOWNLOAD_FW is a Write, not a Read. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Remove outdated commentsMatthew Wilcox2011-11-041-1/+0
| | | | | | | The head can never overrun the tail since we won't allocate enough command IDs to let that happen. The status codes are in sync with the spec. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Update admin opcodes to match the 1.0RC specKrzysztof Wierzbicki2011-11-041-7/+7
| | | | | | Signed-off-by: Krzysztof Wierzbicki <krzysztof.wierzbicki@intel.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Update BAR structure to match the current specMatthew Wilcox2011-11-041-2/+4
| | | | | | | | | | Add two reserved registers in the middle of the BAR to match the 1.0 spec plus ECN 0002. Also rename IMC and ISC to INTMC and INTSC to conform with the spec. We still don't need to use them :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add download / activate firmware ioctlsMatthew Wilcox2011-11-041-6/+27
| | | | Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add remaining status codesMatthew Wilcox2011-11-041-0/+15
| | | | Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add NVME_IOCTL_SUBMIT_IOMatthew Wilcox2011-11-041-0/+18
| | | | | | Allow userspace to submit synchronous I/O like the SCSI sg interface does. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Make nvme_common_command more featurefulMatthew Wilcox2011-11-041-8/+12
| | | | | | | Add prp1, prp2 and the metadata prp to the common command, since the fields are generally used this way. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: New driverMatthew Wilcox2011-11-041-0/+343
This driver is for devices that follow the NVM Express standard Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>