summaryrefslogtreecommitdiffstats
path: root/drivers/lightnvm/pblk.h
Commit message (Collapse)AuthorAgeFilesLines
* lightnvm: pblk: simplify partial read pathIgor Konopko2019-05-061-15/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the approach to handling partial read path. In old approach merging of data from round buffer and drive was fully made by drive. This had some disadvantages - code was complex and relies on bio internals, so it was hard to maintain and was strongly dependent on bio changes. In new approach most of the handling is done mostly by block layer functions such as bio_split(), bio_chain() and generic_make request() and generally is less complex and easier to maintain. Below some more details of the new approach. When read bio arrives, it is cloned for pblk internal purposes. All the L2P mapping, which includes copying data from round buffer to bio and thus bio_advance() calls is done on the cloned bio, so the original bio is untouched. If we found that we have partial read case, we still have original bio untouched, so we can split it and continue to process only first part of it in current context, when the rest will be called as separate bio request which is passed to generic_make_request() for further processing. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Heiner Litz <hlitz@ucsc.edu> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: IO path reorganizationIgor Konopko2019-05-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | This patch is made in order to prepare read path for new approach to partial read handling, which is simpler in compare with previous one. The most important change is to move the handling of completed and failed bio from the pblk_make_rq() to particular read and write functions. This is needed, since after partial read path changes, sometimes completed/failed bio will be different from original one, so we cannot do this any longer in pblk_make_rq(). Other changes are small read path refactor in order to reduce the size of the following patch with partial read changes. Generally the goal of this patch is not to change the functionality, but just to prepare the code for the following changes. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: GC error handlingIgor Konopko2019-05-061-0/+2
| | | | | | | | | | | | | | | | | | | | | Currently when there is an IO error (or similar) on GC read path, pblk still move the line, which was currently under GC process to free state. Such a behaviour can lead to silent data mismatch issue. With this patch, the line which was under GC process on which some IO errors occurred, will be putted back to closed state (instead of free state as it was without this patch) and the L2P mapping for such a failed sectors will not be updated. Then in case of any user IOs to such a failed sectors, pblk would be able to return at least real IO error instead of stale data as it is right now. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: remove internal IO timeoutIgor Konopko2019-05-061-2/+0
| | | | | | | | | | | | Currently during pblk padding, there is internal IO timeout introduced, which is smaller than default NVMe timeout. This can lead to various use-after-free issues. Since in case of any IO timeouts NVMe and block layer will handle timeout by themselves and report it back to use, there is no need to keep this internal timeout in pblk. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: remove unused smeta_ssec fieldIgor Konopko2019-05-061-1/+0
| | | | | | | | | | | smeta_ssec field in pblk_line is never used after it was replaced by the function pblk_line_smeta_start(). Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: reduce L2P memory footprintIgor Konopko2019-05-061-1/+0
| | | | | | | | | | | | | | | | Currently L2P map size is calculated based on the total number of available sectors, which is redundant, since it contains mapping for overprovisioning as well (11% by default). Change this size to the real capacity and thus reduce the memory footprint significantly - with default op value it is approx. 110MB of DRAM less for every 1TB of media. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: fix race condition on GCHeiner Litz2019-02-111-0/+1
| | | | | | | | | | | | | | | | | | | | | This patch fixes a race condition where a write is mapped to the last sectors of a line. The write is synced to the device but the L2P is not updated yet. When the line is garbage collected before the L2P update is performed, the sectors are ignored by the GC logic and the line is freed before all sectors are moved. When the L2P is finally updated, it contains a mapping to a freed line, subsequent reads of the corresponding LBAs fail. This patch introduces a per line counter specifying the number of sectors that are synced to the device but have not been updated in the L2P. Lines with a counter of greater than zero will not be selected for GC. Signed-off-by: Heiner Litz <hlitz@ucsc.edu> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: prevent stall due to wb thresholdJavier González2019-02-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | In order to respect mw_cuinits, pblk's write buffer maintains a backpointer to protect data not yet persisted; when writing to the write buffer, this backpointer defines a threshold that pblk's rate-limiter enforces. On small PU configurations, the following scenarios might take place: (i) the threshold is larger than the write buffer and (ii) the threshold is smaller than the write buffer, but larger than the maximun allowed split bio - 256KB at this moment (Note that writes are not always split - we only do this when we the size of the buffer is smaller than the buffer). In both cases, pblk's rate-limiter prevents the I/O to be written to the buffer, thus stalling. This patch fixes the original backpointer implementation by considering the threshold both on buffer creation and on the rate-limiters path, when bio_split is triggered (case (ii) above). Fixes: 766c8ceb16fc ("lightnvm: pblk: guarantee that backpointer is respected on writer stall") Signed-off-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: Switch to use new generic UUID APIAndy Shevchenko2019-02-111-9/+1
| | | | | | | | | | | | There are new types and helpers that are supposed to be used in new code. As a preparation to get rid of legacy types and API functions do the conversion here. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: Use u64 instead of __le64 for CPU visible sideAndy Shevchenko2019-02-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Sparse complains about using strict data types: drivers/lightnvm/pblk-read.c:254:43: warning: incorrect type in assignment (different base types) drivers/lightnvm/pblk-read.c:254:43: expected restricted __le64 <noident> drivers/lightnvm/pblk-read.c:254:43: got unsigned long long [unsigned] [usertype] <noident> drivers/lightnvm/pblk-read.c:255:29: warning: cast from restricted __le64 drivers/lightnvm/pblk-read.c:268:29: warning: cast from restricted __le64 drivers/lightnvm/pblk-read.c:328:41: warning: incorrect type in assignment (different base types) drivers/lightnvm/pblk-read.c:328:41: expected restricted __le64 <noident> drivers/lightnvm/pblk-read.c:328:41: got unsigned long long [unsigned] [usertype] <noident> In the code it seems explicit that lba_list_mem and lba_list_media members of struct pblk_pr_ctx are used on CPU side, which means they should not be of strict types. Change types of lba_list_mem and lba_list_media members to be u64. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: do not overwrite ppa list with meta listIgor Konopko2018-12-111-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | Ehen using pblk with 0 sized metadata both ppa list and meta list points to the same memory since pblk_dma_meta_size() returns 0 in that case. This patch fix that issue by ensuring that pblk_dma_meta_size() always returns space equal to sizeof(struct pblk_sec_meta) and thus ppa list and meta list points to different memory address. Even that in that case drive does not really care about meta_list pointer, this is the easiest way to fix that issue without introducing changes in many places in the code just for 0 sized metadata case. The same approach needs to be also done for pblk_get_sec_meta() since we also cannot point to the same memory address in meta buffer when we are using it for pblk recovery process Reported-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Tested-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: support packed metadataIgor Konopko2018-12-111-1/+9
| | | | | | | | | | | | | | | | | | | pblk performs recovery of open lines by storing the LBA in the per LBA metadata field. Recovery therefore only works for drives that has this field. This patch adds support for packed metadata, which store l2p mapping for open lines in last sector of every write unit and enables drives without per IO metadata to recover open lines. After this patch, drives with OOB size <16B will use packed metadata and metadata size larger than16B will continue to use the device per IO metadata. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: dynamic DMA pool entry sizeIgor Konopko2018-12-111-1/+5
| | | | | | | | | | | | | | | | | Currently lightnvm and pblk uses single DMA pool, for which the entry size always is equal to PAGE_SIZE. The contents of each entry allocated from the DMA pool consists of a PPA list (8bytes * 64), leaving 56bytes * 64 space for metadata. Since the metadata field can be bigger, such as 128 bytes, the static size does not cover this use-case. This patch adds support for I/O metadata above 56 bytes by changing DMA pool size based on device meta size and allows pblk to use OOB metadata >=16B. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: add helpers for OOB metadataIgor Konopko2018-12-111-0/+6
| | | | | | | | | | | | | pblk currently assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive in the future. After this patch only OOB metadata equal to 16 bytes is supported. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: move lba list to partial read contextIgor Konopko2018-12-111-0/+2
| | | | | | | | | | | | Currently DMA allocated memory is reused on partial read for lba_list_mem and lba_list_media arrays. In preparation for dynamic DMA pool sizes we need to move this arrays into pblk_pr_ctx structures. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: set conservative threshold for user writesHans Holmberg2018-12-111-1/+11
| | | | | | | | | | | | | | | | | | | | | | | In a worst-case scenario (random writes), OP% of sectors in each line will be invalid, and we will then need to move data out of 100/OP% lines to free a single line. So, to prevent the possibility of running out of lines, temporarily block user writes when there is less than 100/OP% free lines. Also ensure that pblk creation does not produce instances with insufficient over provisioning. Insufficient over-provising is not a problem on real hardware, but often an issue when running QEMU simulations (with few lines). 100 lines is enough to create a sane instance with the standard (11%) over provisioning. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: stop writes gracefully when running out of linesHans Holmberg2018-12-111-2/+2
| | | | | | | | | | If mapping fails (i.e. when running out of lines), handle the error and stop writing. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: guarantee that backpointer is respected on writer stallJavier González2018-10-091-1/+7
| | | | | | | | | | | | | | | | | | | | | pblk's write buffer must guarantee that it respects the device's constrains for reads (i.e., mw_cunits). This is done by maintaining a backpointer that updates the L2P table as entries wrap up, making them point to the media instead of pointing to the write buffer. This mechanism can race in case that the write thread stalls, as the write pointer will protect the last written entry, thus disregarding the read constrains. This patch adds an extra check on wrap up, making sure that the threshold is respected at all times, preventing new entries to overwrite committed data, also in case of write thread stall. Reported-by: Heiner Litz <hlitz@ucsc.edu> Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Heiner Litz <hlitz@ucsc.edu> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: move ring buffer alloc/free rb initJavier González2018-10-091-5/+2
| | | | | | | | | | | | pblk's read/write buffer currently takes a buffer and its size and uses it to create the metadata around it to use it as a ring buffer. This puts the responsibility of allocating/freeing ring buffer memory on the ring buffer user. Instead, move it inside of the ring buffer helpers (pblk-rb.c). This simplifies creation/destruction routines. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: encapsulate rb pointer operationsJavier González2018-10-091-0/+2
| | | | | | | | | | | | pblk's read/write buffer is always a power-of-2, thus wrapping up the buffer can be done with a bit mask. Since this is an implementation detail internal to the write buffer, make a helper that hides pointer increment + wrap, and allows to transparently relax this assumption in the future. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: remove unused functionJavier González2018-10-091-2/+0
| | | | | | | | Removed unused function in pblk-rb.c Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: add SPDX license tagJavier González2018-10-091-0/+1
| | | | | | | | Add GLP-2.0 SPDX license tag to all pblk files Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: take write semaphore on metadataJavier González2018-10-091-0/+1
| | | | | | | | | | | | | | | | | | | pblk guarantees write ordering at a chunk level through a per open chunk semaphore. At this point, since we only have an open I/O stream for both user and GC data, the semaphore is per parallel unit. For the metadata I/O that is synchronous, the semaphore is not needed as ordering is guaranteed. However, if the metadata scheme changes or multiple streams are open, this guarantee might not be preserved. This patch makes sure that all writes go through the semaphore, even for synchronous I/O. This is consistent with pblk's write I/O model. It also simplifies maintenance since changes in the metadata scheme could cause ordering issues. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: refactor metadata pathsJavier González2018-10-091-2/+2
| | | | | | | | | | | | | | | | | pblk maintains two different metadata paths for smeta and emeta, which store metadata at the start of the line and at the end of the line, respectively. Until now, these path has been common for writing and retrieving metadata, however, as these paths diverge, the common code becomes less clear and unnecessary complicated. In preparation for further changes to the metadata write path, this patch separates the write and read paths for smeta and emeta and removes the synchronous emeta path as it not used anymore (emeta is scheduled asynchronously to prevent jittering due to internal I/Os). Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: encapsulate rqd dma allocationsJavier González2018-10-091-0/+2
| | | | | | | | | | dma allocations for ppa_list and meta_list in rqd are replicated in several places across the pblk codebase. Make helpers to encapsulate creation and deletion to simplify the code. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: calculate line pad distance in helperJavier González2018-10-091-8/+0
| | | | | | | | | If a line is padded, calculate the pad distance directly on the helper being used for this purpose. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: move ppa transformations to coreJavier González2018-10-091-74/+4
| | | | | | | | | Continuing the effort of moving 1.2 and 2.0 specific code to core, move 64_to_32 and 32_to_64 ppa helpers from pblk to core. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: add tracing for chunk resetsHans Holmberg2018-10-091-0/+6
| | | | | | | | Trace state of chunk resets. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: add trace events for chunk statesHans Holmberg2018-10-091-0/+8
| | | | | | | | | | Introduce trace points for tracking chunk states in pblk - this is useful for inspection of the entire state of the drive, and real handy for both fw and pblk debugging. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: remove debug from pblk_[down/up]_pageMatias Bjørling2018-10-091-3/+3
| | | | | | | | | | | | | | | | Remove the debug only iteration within __pblk_down_page, which then allows us to reduce the number of arguments down to pblk and the parallel unit from the functions that calls it. Simplifying the callers logic considerably. Also, rename the functions pblk_[down/up]_page to pblk_[down/up]_chunk, to communicate that it manages the write pointer of the chunk. Note that it also protects the parallel unit such that at most one chunk is active per parallel unit. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: remove unused parameters in pblk_up_rqHans Holmberg2018-10-091-2/+1
| | | | | | | | The parameters nr_ppas and ppa_list are not used, so remove them. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: allocate line map bitmaps using a mempoolHans Holmberg2018-10-091-0/+4
| | | | | | | | | | | | | | Line map bitmap allocations are fairly large and can fail. Allocation failures are fatal to pblk, stopping the write pipeline. To avoid this, allocate the bitmaps using a mempool instead. Mempool allocations never fail if called from a process context, and pblk *should* only allocate map bitmaps in process context, but keep the failure handling for robustness sake. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: introduce nvm_rq_to_ppa_listHans Holmberg2018-10-091-3/+1
| | | | | | | | | | | | | | | | | | There is a number of places in the lightnvm subsystem where the user iterates over the ppa list. Before iterating, the user must know if it is a single or multiple LBAs due to vector commands using either the nvm_rq ->ppa_addr or ->ppa_list fields on command submission, which leads to open-coding the if/else statement. Instead of having multiple if/else's, move it into a function that can be called by its users. A nice side effect of this cleanup is that this patch fixes up a bunch of cases where we don't consider the single-ppa case in pblk. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: remove unused variable.Javier González2018-10-091-1/+0
| | | | | | | | Removed unused struct ppa_addr variable. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: improve line helpersJavier González2018-10-091-4/+9
| | | | | | | | | | | | | The current helper to obtain a line from a ppa returns the line id, which requires its users to explicitly retrieve the pointer to the line with the id. Make 2 different helpers: one returning the line id and one returning the line directly. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: add helpers for chunk addressesJavier González2018-10-091-0/+19
| | | | | | | | | | | | Implement helpers to go from ppas to a chunk within a line and an address within a chunk. These helpers will be used on the patches adding trace support in pblk, which will be sent in this window. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: refactor put line fn on read completionMatias Bjørling2018-10-091-0/+2
| | | | | | | | | | | | | | | | | The read completion path uses the put_line variable to decide whether the reference on a line should be released. The function name used for that is pblk_read_put_rqd_kref, which could lead one to believe that it is the rqd that is releasing the reference, while it is the line reference that is put. Rename and also split the function in two to account for either rqd or single ppa callers and move it to core, such that it later can be used in the write path as well. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Reviewed-by: Heiner Litz <hlitz@ucsc.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: unify vector max req constantsMatias Bjørling2018-10-091-6/+4
| | | | | | | | | | | | | | Both NVM_MAX_VLBA and PBLK_MAX_REQ_ADDRS define how many LBAs that are available in a vector command. pblk uses them interchangeably in its implementation. Use NVM_MAX_VLBA as the main one and remove usages of PBLK_MAX_REQ_ADDRS. Also remove the power representation that only has one user, and instead calculate it at runtime. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: move bad block and chunk state logic to coreMatias Bjørling2018-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | pblk implements two data paths for recovery line state. One for 1.2 and another for 2.0, instead of having pblk implement these, combine them in the core to reduce complexity and make available to other targets. The new interface will adhere to the 2.0 chunk definition, including managing open chunks with an active write pointer. To provide this interface, a 1.2 device recovers the state of the chunks by manually detecting if a chunk is either free/open/close/offline, and if open, scanning the flash pages sequentially to find the next writeable page. This process takes on average ~10 seconds on a device with 64 dies, 1024 blocks and 60us read access time. The process can be parallelized but is left out for maintenance simplicity, as the 1.2 specification is deprecated. For 2.0 devices, the logic is maintained internally in the drive and retrieved through the 2.0 interface. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: combine 1.2 and 2.0 command flagsMatias Bjørling2018-10-091-38/+0
| | | | | | | | | | | | | | | Add nvm_set_flags helper to enable core to appropriately set the command flags for read/write/erase depending on which version a drive supports. The flags arguments can be distilled into the access hint, scrambling, and program/erase suspend. Replace the access hint with a "is_seq" parameter. The rest of the flags are dependent on the command opcode, which is trivial to detect and set. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: add asynchronous partial readHeiner Litz2018-07-131-0/+10
| | | | | | | | | | | | In the read path, partial reads are currently performed synchronously which affects performance for workloads that generate many partial reads. This patch adds an asynchronous partial read path as well as the required partial read ctx. Signed-off-by: Heiner Litz <hlitz@ucsc.edu> Reviewed-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: expose generic disk name on pr_* msgsMatias Bjørling2018-07-131-9/+20
| | | | | | | | | | | The error messages in pblk does not say which pblk instance that a message occurred from. Update each error message to reflect the instance it belongs to, and also prefix it with pblk, so we know the message comes from the pblk module. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: move NVM_DEBUG to pblkMatias Bjørling2018-07-131-3/+3
| | | | | | | | | | | | | There is no users of CONFIG_NVM_DEBUG in the LightNVM subsystem. All users are in pblk. Rename NVM_DEBUG to NVM_PBLK_DEBUG and enable only for pblk. Also fix up the CONFIG_NVM_PBLK entry to follow the code style for Kconfig files. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: handle case when mw_cunits equals to 0Marcin Dziegielewski2018-07-131-3/+0
| | | | | | | | | | | | | | | | | | | | | Some devices can expose mw_cunits equal to 0, it can cause the creation of too small write buffer and cause performance to drop on write workloads. Additionally, write buffer size must cover write data requirements, such as WS_MIN and MW_CUNITS - it must be greater than or equal to the larger one multiplied by the number of PUs. However, for performance reasons, use the WS_OPT value to calculation instead of WS_MIN. Because the place where buffer size is calculated was changed, this patch also removes pgs_in_buffer filed in pblk structure. Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: kick writer on new flush pointsHans Holmberg2018-06-011-0/+1
| | | | | | | | | | Unless we kick the writer directly when setting a new flush point, the user risks having to wait for up to one second (the default timeout for the write thread to be kicked) for the IO to complete. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: garbage collect lines with failed writesHans Holmberg2018-06-011-4/+21
| | | | | | | | | | | | | | | | | | | | Write failures should not happen under normal circumstances, so in order to bring the chunk back into a known state as soon as possible, evacuate all the valid data out of the line and let the fw judge if the block can be written to in the next reset cycle. Do this by introducing a new gc list for lines with failed writes, and ensure that the rate limiter allocates a small portion of the write bandwidth to get the job done. The lba list is saved in memory for use during gc as we cannot gurantee that the emeta data is readable if a write error occurred. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: rework write error recovery pathHans Holmberg2018-06-011-8/+3
| | | | | | | | | | | | | | | | | | The write error recovery path is incomplete, so rework the write error recovery handling to do resubmits directly from the write buffer. When a write error occurs, the remaining sectors in the chunk are mapped out and invalidated and the request inserted in a resubmit list. The writer thread checks if there are any requests to resubmit, scans and invalidates any lbas that have been overwritten by later writes and resubmits the failed entries. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: remove dead functionJavier González2018-06-011-1/+0
| | | | | | | | Remove dead function for manual sync. I/O Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pass flag on graceful teardown to targetsJavier González2018-06-011-1/+3
| | | | | | | | | | | | | | | | If the namespace is unregistered before the LightNVM target is removed (e.g., on hot unplug) it is too late for the target to store any metadata on the device - any attempt to write to the device will fail. In this case, pass on a "gracefull teardown" flag to the target to let it know when this happens. In the case of pblk, we pad the open line (close all open chunks) to improve data retention. In the event of an ungraceful shutdown, avoid this part and just clean up. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: remove unnecessary argumentJavier González2018-06-011-1/+1
| | | | | | | | Remove unnecessary argument on pblk_line_free() Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>