summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-blockLinus Torvalds2018-04-05148-2060/+4266
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer updates from Jens Axboe: "It's a pretty quiet round this time, which is nice. This contains: - series from Bart, cleaning up the way we set/test/clear atomic queue flags. - series from Bart, fixing races between gendisk and queue registration and removal. - set of bcache fixes and improvements from various folks, by way of Michael Lyle. - set of lightnvm updates from Matias, most of it being the 1.2 to 2.0 transition. - removal of unused DIO flags from Nikolay. - blk-mq/sbitmap memory ordering fixes from Omar. - divide-by-zero fix for BFQ from Paolo. - minor documentation patches from Randy. - timeout fix from Tejun. - Alpha "can't write a char atomically" fix from Mikulas. - set of NVMe fixes by way of Keith. - bsg and bsg-lib improvements from Christoph. - a few sed-opal fixes from Jonas. - cdrom check-disk-change deadlock fix from Maurizio. - various little fixes, comment fixes, etc from various folks" * tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits) blk-mq: Directly schedule q->timeout_work when aborting a request blktrace: fix comment in blktrace_api.h lightnvm: remove function name in strings lightnvm: pblk: remove some unnecessary NULL checks lightnvm: pblk: don't recover unwritten lines lightnvm: pblk: implement 2.0 support lightnvm: pblk: implement get log report chunk lightnvm: pblk: rename ppaf* to addrf* lightnvm: pblk: check for supported version lightnvm: implement get log report chunk helpers lightnvm: make address conversions depend on generic device lightnvm: add support for 2.0 address format lightnvm: normalize geometry nomenclature lightnvm: complete geo structure with maxoc* lightnvm: add shorten OCSSD version in geo lightnvm: add minor version to generic geometry lightnvm: simplify geometry structure lightnvm: pblk: refactor init/exit sequences lightnvm: Avoid validation of default op value lightnvm: centralize permission check for lightnvm ioctl ...
| * blk-mq: Directly schedule q->timeout_work when aborting a requestTejun Heo2018-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Request abortion is performed by overriding deadline to now and scheduling timeout handling immediately. For the latter part, the code was using mod_timer(timeout, 0) which can't guarantee that the timer runs afterwards. Let's schedule the underlying work item directly instead. This fixes the hangs during probing reported by Sitsofe but it isn't yet clear to me how the failure can happen reliably if it's just the above described race condition. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sitsofe Wheeler <sitsofe@gmail.com> Reported-by: Meelis Roos <mroos@linux.ee> Fixes: 358f70da49d7 ("blk-mq: make blk_abort_request() trigger timeout path") Cc: stable@vger.kernel.org # v4.16 Link: http://lkml.kernel.org/r/CALjAwxh-PVYFnYFCJpGOja+m5SzZ8Sa4J7ohxdK=r8NyOF-EMA@mail.gmail.com Link: http://lkml.kernel.org/r/alpine.LRH.2.21.1802261049140.4893@math.ut.ee Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * blktrace: fix comment in blktrace_api.hSouvik Banerjee2018-03-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The `__u64 time` field of the blk_io_trace struct refers to the time in nanoseconds, not in microseconds. It is set in __blk_add_trace, which does the following: t->time = ktime_to_ns(ktime_get()); ktime_to_ns returns ktime_t in nanoseconds, not microseconds. Signed-off-by: Souvik Banerjee <souvik1997@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: remove function name in stringsMatias Bjørling2018-03-291-6/+6
| | | | | | | | | | | | | | | | | | | | For the sysfs functions, the function names are embedded into their error strings. If the function name later changes, the string may not be updated accordingly. Update the strings to use __func__ to avoid this. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: remove some unnecessary NULL checksDan Carpenter2018-03-291-4/+2
| | | | | | | | | | | | | | | | | | | | | | Smatch complains that flush_workqueue() dereferences the work queue pointer but then we check if it's NULL on the next line when it's too late. These NULL checks can be removed because the module won't load if we can't allocate the work queues. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: don't recover unwritten linesHans Holmberg2018-03-291-0/+18
| | | | | | | | | | | | | | | | | | | | | | If the line has not been written to, we should not try to recover any data from it, so check the state of the chunks in the line before attempting to read smeta. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: implement 2.0 supportJavier González2018-03-293-64/+234
| | | | | | | | | | | | | | | | | | Implement 2.0 support in pblk. This includes the address formatting and mapping paths, as well as the sysfs entries for them. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: implement get log report chunkJavier González2018-03-294-82/+298
| | | | | | | | | | | | | | | | | | In preparation of pblk supporting 2.0, implement the get log report chunk in pblk. Also, define the chunk states as given in the 2.0 spec. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: rename ppaf* to addrf*Javier González2018-03-293-14/+14
| | | | | | | | | | | | | | | | | | In preparation for 2.0 support in pblk, rename variables referring to the address format to addrf and reserve ppaf for the 1.2 path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: check for supported versionJavier González2018-03-291-2/+8
| | | | | | | | | | | | | | | | | | | | At this point, only 1.2 spec is supported, thus check for it. Also, since device-side L2P is only supported in the 1.2 spec, make sure to only check its value under 1.2. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: implement get log report chunk helpersJavier González2018-03-294-2/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The 2.0 spec provides a report chunk log page that can be retrieved using the stangard nvme get log page. This replaces the dedicated get/put bad block table in 1.2. This patch implements the helper functions to allow targets retrieve the chunk metadata using get log page. It makes nvme_get_log_ext available outside of nvme core so that we can use it form lightnvm. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: make address conversions depend on generic deviceJavier González2018-03-292-6/+6
| | | | | | | | | | | | | | | | | | On address conversions, use the generic device, instead of the target device. This allows to use conversions outside of the target's realm. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: add support for 2.0 address formatJavier González2018-03-296-48/+95
| | | | | | | | | | | | | | | | | | | | | | | | Add support for 2.0 address format. Also, align address bits for 1.2 and 2.0 to be able to operate on channel and luns without requiring a format conversion. Use a generic address format for this purpose. Also, convert the generic operations to the generic format in pblk. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: normalize geometry nomenclatureJavier González2018-03-297-109/+108
| | | | | | | | | | | | | | | | | | Normalize nomenclature for naming channels, luns, chunks, planes and sectors as well as derivations in order to improve readability. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: complete geo structure with maxoc*Javier González2018-03-292-0/+19
| | | | | | | | | | | | | | | | | | Complete the generic geometry structure with the maxoc and maxocpu felds, present in the 2.0 spec. Also, expose them through sysfs. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: add shorten OCSSD version in geoJavier González2018-03-292-0/+14
| | | | | | | | | | | | | | | | Create a shorten version to use in the generic geometry. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: add minor version to generic geometryJavier González2018-03-293-8/+24
| | | | | | | | | | | | | | | | | | | | Separate the version between major and minor on the generic geometry and represent it through sysfs in the 2.0 path. The 1.2 path only shows the major version to preserve the existing user space interface. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: simplify geometry structureJavier González2018-03-2912-425/+451
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the device geometry is stored redundantly in the nvm_id and nvm_geo structures at a device level. Moreover, when instantiating targets on a specific number of LUNs, these structures are replicated and manually modified to fit the instance channel and LUN partitioning. Instead, create a generic geometry around nvm_geo, which can be used by (i) the underlying device to describe the geometry of the whole device, and (ii) instances to describe their geometry independently. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: refactor init/exit sequencesJavier González2018-03-291-203/+202
| | | | | | | | | | | | | | | | | | Refactor init and exit sequences to eliminate dependencies among init modules and improve readability. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: Avoid validation of default op valueHeiner Litz2018-03-291-4/+2
| | | | | | | | | | | | | | | | Fixes: 38401d231de65 ("lightnvm: set target over-provision on create ioctl") Signed-off-by: Heiner Litz <hlitz@ucsc.edu> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: centralize permission check for lightnvm ioctlJohannes Thumshirn2018-03-291-18/+3
| | | | | | | | | | | | | | | | | | | | | | | | Currently all functions for handling the lightnvm core ioctl commands do a check for CAP_SYS_ADMIN. Change this to fail early in nvm_ctl_ioctl(), so we don't have to duplicate the permission checks all over. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: fix bad block initializationHeiner Litz2018-03-291-1/+2
| | | | | | | | | | | | | | | | | | fix reading bad block device information to correctly setup the per line blk_bitmap during lightnvm initialization Signed-off-by: Heiner Litz <hlitz@ucsc.edu> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme: lightnvm: add late setup of block size and metadataMatias Bjørling2018-03-294-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | The nvme driver sets up the size of the nvme namespace in two steps. First it initializes the device with standard logical block and metadata sizes, and then sets the correct logical block and metadata size. Due to the OCSSD 2.0 specification relies on the namespace to expose these sizes for correct initialization, let it be updated appropriately on the LightNVM side as well. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: remove nvm_dev_ops->max_phys_sectMatias Bjørling2018-03-295-39/+16
| | | | | | | | | | | | | | | | | | The value of max_phys_sect is always static. Instead of defining it in the nvm_dev_ops structure, declare it as a global value. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: remove max_rq_sizeMatias Bjørling2018-03-292-3/+0
| | | | | | | | | | | | | | The field is no longer used. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: add 2.0 geometry identificationMatias Bjørling2018-03-293-58/+299
| | | | | | | | | | | | | | | | | | Implement the geometry data structures for 2.0 and enable a drive to be identified as one, including exposing the appropriate 2.0 sysfs entries. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: flatten nvm_id_group into nvm_idMatias Bjørling2018-03-293-95/+89
| | | | | | | | | | | | | | | | There are no groups in the 2.0 specification, make sure that the nvm_id structure is flattened before 2.0 data structures are added. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: make 1.2 data structures explicitMatias Bjørling2018-03-291-41/+41
| | | | | | | | | | | | | | | | | | Make the 1.2 data structures explicit, so it will be easy to identify the 2.0 data structures. Also fix the order of which the nvme_nvm_* are declared, such that they follow the nvme_nvm_command order. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: refactor bad block identificationJavier González2018-03-293-109/+109
| | | | | | | | | | | | | | | | | | | | In preparation for the OCSSD 2.0 spec. bad block identification, refactor the current code to generalize bad block get/set functions and structures. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: prevent race in pblk_rb_flush_point_setHans Holmberg2018-03-291-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Make sure that we are not advancing the sync pointer while we're adding bios to the write buffer entry completion list. This race condition results in bios not completing and was identified by a hang when running xfstest generic/113. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: allow allocation of new lines during shutdownHans Holmberg2018-03-291-7/+0
| | | | | | | | | | | | | | | | | | | | | | When shutting down pblk the write buffer is flushed and if the current line can't fit the data in the write buffer we need to allocate a new line, so remove the check that prevents this. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: delete writer kick timer before stopping threadHans Holmberg2018-03-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | Unless we delete the timer that wakes up the write thread before we stop the thread we risk re-starting the thread, so delete the timer first. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: add padding distribution sysfs attributeHans Holmberg2018-03-294-13/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When pblk receives a sync, all data up to that point in the write buffer must be comitted to persistent storage, and as flash memory comes with a minimal write size there is a significant cost involved both in terms of time for completing the sync and in terms of write amplification padded sectors for filling up to the minimal write size. In order to get a better understanding of the costs involved for syncs, Add a sysfs attribute to pblk: padded_dist, showing a normalized distribution of sectors padded. In order to facilitate measurements of specific workloads during the lifetime of the pblk instance, the distribution can be reset by writing 0 to the attribute. Do this by introducing counters for each possible padding: {0..(minimal write size - 1)} and calculate the normalized distribution when showing the attribute. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Javier González <javier@cnexlabs.com> Rearranged total_buckets statement in pblk_sysfs_get_padding_dist Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: remove multiple groups in 1.2 data structureMatias Bjørling2018-03-291-2/+3
| | | | | | | | | | | | | | | | Only one id group from the 1.2 specification is supported. Make sure that only the first group is accessible. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: remove mlc pairs structureMatias Bjørling2018-03-291-13/+1
| | | | | | | | | | | | | | | | | | The known implementations of the 1.2 specification, and upcoming 2.0 implementation all expose a sequential list of pages to write. Remove the data structure, as it is no longer needed. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: export write amplification counters to sysfsHans Holmberg2018-03-298-10/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a SSD, write amplification, WA, is defined as the average number of page writes per user page write. Write amplification negatively affects write performance and decreases the lifetime of the disk, so it's a useful metric to add to sysfs. In plkb's case, the number of writes per user sector is the sum of: (1) number of user writes (2) number of sectors written by the garbage collector (3) number of sectors padded (i.e. due to syncs) This patch adds persistent counters for 1-3 and two sysfs attributes to export these along with WA calculated with five decimals: write_amp_mileage: the accumulated write amplification stats for the lifetime of the pblk instance write_amp_trip: resetable stats to facilitate delta measurements, values reset at creation and if 0 is written to the attribute. 64-bit counters are used as a 32 bit counter would wrap around already after about 17 TB worth of user data. It will take a long long time before the 64 bit sector counters wrap around. The counters are stored after the bad block bitmap in the first emeta sector of each written line. There is plenty of space in the first emeta sector, so we don't need to bump the major version of the line data format. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: check data lines version on recoveryHans Holmberg2018-03-293-5/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As a preparation for future bumps of data line persistent storage versions, we need to start checking the emeta line version during recovery. Also slit up the current emeta/smeta version into two bytes (major,minor). Recovering lines with the same major number as the current pblk data line version must succeed. This means that any changes in the persistent format must be: (1) Backward compatible: if we switch back to and older kernel, recovery of lines stored with major == current_major and minor > current_minor must succeed. (2) Forward compatible: switching to a newer kernel, recovery of lines stored with major=current_major and minor < minor must handle the data format differences gracefully(i.e. initialize new data structures to default values). If we detect lines that have a different major number than the current we must abort recovery. The user must manually migrate the data in this case. Previously the version stored in the emeta header was copied from smeta, which has version 1, so we need to set the minor version to 1. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: pblk: handle bad sectors in the emeta area correctlyHans Holmberg2018-03-291-5/+6
| | | | | | | | | | | | | | | | | | | | | | Unless we check if there are bad sectors in the entire emeta-area we risk ending up with valid bitmap / available sector count inconsistency. This results in lines with a bad chunk at the last LUN marked as bad, so go through the whole emeta area and mark up the invalid sectors. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm: remove chnl_offset in nvme_nvm_identityMatias Bjørling2018-03-291-3/+1
| | | | | | | | | | | | | | | | | | | | The identity structure is initialized to zero in the beginning of the nvme_nvm_identity function. The chnl_offset is separately set to zero. Since both the variable and assignment is never changed, remove them. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * lightnvm/pblk-gc: Delete an error message for a failed memory allocation in ↵Markus Elfring2018-03-291-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | pblk_gc_line_prepare_ws() Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * blk-mq: Allow PCI vector offset for mapping queuesKeith Busch2018-03-275-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The PCI interrupt vectors intended to be associated with a queue may not start at 0; a driver may allocate pre_vectors for special use. This patch adds an offset parameter so blk-mq may find the intended affinity mask and updates all drivers using this API accordingly. Cc: Don Brace <don.brace@microsemi.com> Cc: <qla2xxx-upstream@qlogic.com> Cc: <linux-scsi@vger.kernel.org> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * loop: use killable lock in ioctlsOmar Sandoval2018-03-271-10/+19
| | | | | | | | | | | | | | | | | | | | Even after the previous patch to drop lo_ctl_mutex while calling vfs_getattr(), there are other cases where we can end up sleeping for a long time while holding lo_ctl_mutex. Let's avoid the uninterruptible sleep from the ioctls. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * loop: don't call into filesystem while holding lo_ctl_mutexOmar Sandoval2018-03-271-14/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We hit an issue where a loop device on NFS was stuck in loop_get_status() doing vfs_getattr() after the NFS server died, which caused a pile-up of uninterruptible processes waiting on lo_ctl_mutex. There's no reason to hold this lock while we wait on the filesystem; let's drop it so that other processes can do their thing. We need to grab a reference on lo_backing_file while we use it, and we can get rid of the check on lo_device, which has been unnecessary since commit a34c0ae9ebd6 ("[PATCH] loop: remove the bio remapping capability") in the linux-history tree. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block, bfq: lower-bound the estimated peak rate to 1Paolo Valente2018-03-262-2/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a storage device handled by BFQ happens to be slower than 7.5 KB/s for a certain amount of time (in the order of a second), then the estimated peak rate of the device, maintained in BFQ, becomes equal to 0. The reason is the limited precision with which the rate is represented (details on the range of representable values in the comments introduced by this commit). This leads to a division-by-zero error where the estimated peak rate is used as divisor. Such a type of failure has been reported in [1]. This commit addresses this issue by: 1. Lower-bounding the estimated peak rate to 1 2. Adding and improving comments on the range of rates representable [1] https://www.spinics.net/lists/kernel/msg2739205.html Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme: make nvme_get_log_ext non-staticMatias Bjørling2018-03-262-1/+4
| | | | | | | | | | | | | | | | | | Enable the lightnvm integration to use the nvme_get_log_ext() function. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvmet: constify struct nvmet_fabrics_opsChristoph Hellwig2018-03-265-17/+17
| | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvmet: refactor configfs transport type handlingChristoph Hellwig2018-03-261-32/+29
| | | | | | | | | | | | | | | | | | Have a common table of mappings from numerical transport ids to names, and zero the transport specific area in common code in nvmet_addr_trtype_store. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvmet: move device_uuid configfs attr definition to suitable placeMax Gurtovoy2018-03-261-2/+2
| | | | | | | | | | | | Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme: Add .stop_ctrl to nvme ctrl opsNitzan Carmi2018-03-263-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For consistancy reasons, any fabric-specific works (e.g error recovery/reconnect) should be canceled in nvme_stop_ctrl, as for all other NVMe pending works (e.g. scan, keep alive). The patch aims to simplify the logic of the code, as we now only rely on a vague demand from any fabric to flush its private workqueues at the beginning of .delete_ctrl op. Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme-rdma: Allow DELETING state change failure in error_recoveryNitzan Carmi2018-03-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | While error recovery is ongoing, it is OK to move ctrl to DELETING state (from concurrent delete_work). Thus we don't need a warning for that case. Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>