summaryrefslogtreecommitdiffstats
path: root/drivers/lightnvm/pblk-recovery.c
Commit message (Collapse)AuthorAgeFilesLines
* lightnvm: remove set but not used variables 'data_len' and 'rq_len'YueHaibing2019-08-081-2/+1
| | | | | | | | | | | | | | drivers/lightnvm/pblk-read.c: In function pblk_submit_read_gc: drivers/lightnvm/pblk-read.c:423:6: warning: variable data_len set but not used [-Wunused-but-set-variable] drivers/lightnvm/pblk-recovery.c: In function pblk_recov_scan_oob: drivers/lightnvm/pblk-recovery.c:368:15: warning: variable rq_len set but not used [-Wunused-but-set-variable] They are not used since commit 48e5da725581 ("lightnvm: move metadata mapping to lower level driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: move metadata mapping to lower level driverHans Holmberg2019-08-061-34/+5
| | | | | | | | | | Now that blk_rq_map_kern can map both kmem and vmem, move internal metadata mapping down to the lower level driver. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hans Holmberg <hans@owltronix.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: use nvm_rq_to_ppa_list()Igor Konopko2019-05-061-5/+8
| | | | | | | | | | | This patch replaces few remaining usages of rqd->ppa_list[] with existing nvm_rq_to_ppa_list() helpers. This is needed for theoretical devices with ws_min/ws_opt equal to 1. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: recover only written metadataIgor Konopko2019-05-061-3/+5
| | | | | | | | | This patch ensures that smeta was fully written before even trying to read it based on chunk table state and write pointer. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: remove internal IO timeoutIgor Konopko2019-05-061-6/+1
| | | | | | | | | | | | Currently during pblk padding, there is internal IO timeout introduced, which is smaller than default NVMe timeout. This can lead to various use-after-free issues. Since in case of any IO timeouts NVMe and block layer will handle timeout by themselves and report it back to use, there is no need to keep this internal timeout in pblk. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: wait for inflight IOs in recoveryIgor Konopko2019-05-061-13/+12
| | | | | | | | | | | | | | | | | This patch changes the behaviour of recovery padding in order to support a case, when some IOs were already submitted to the drive and some next one are not submitted due to error returned. Currently in case of errors we simply exit the pad function without waiting for inflight IOs, which leads to panic on inflight IOs completion. After the changes we always wait for all the inflight IOs before exiting the function. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: propagate errors when reading metaIgor Konopko2019-05-061-1/+1
| | | | | | | | | | | | | Read errors are not correctly propagated. Errors are cleared before returning control to the io submitter. Change the behaviour such that all read errors exept high ecc read warning status is returned appropriately. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: fix update line wp in OOB recoveryIgor Konopko2019-05-061-1/+15
| | | | | | | | | | | | | | In case of OOB recovery, we can hit the scenario when all the data in line were written and some part of emeta was written too. In such a case pblk_update_line_wp() function will call pblk_alloc_page() function which will case to set left_msecs to value below zero (since this field does not track emeta region) and thus will lead to multiple kernel warnings. This patch fixes that issue. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: set propper line as data_line after gcMarcin Dziegielewski2019-05-061-0/+1
| | | | | | | | | | | | In current implementation of l2p recovery, when we are after gc and we have open line, we are not setting current data line properly (we set last line from the device instead of last line ordered by seq_nr) and in consequence, kernel panic and data corruption. Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: reduce L2P memory footprintIgor Konopko2019-05-061-1/+1
| | | | | | | | | | | | | | | | Currently L2P map size is calculated based on the total number of available sectors, which is redundant, since it contains mapping for overprovisioning as well (11% by default). Change this size to the real capacity and thus reduce the memory footprint significantly - with default op value it is approx. 110MB of DRAM less for every 1TB of media. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: extend line wp balance checkHans Holmberg2019-02-111-18/+38
| | | | | | | | | | | | | | | | | | | | pblk stripes writes of minimal write size across all non-offline chunks in a line, which means that the maximum write pointer delta should not exceed the minimal write size. Extend the line write pointer balance check to cover this case, and ignore the offline chunk wps. This will render us a warning during recovery if something unexpected has happened to the chunk write pointers (i.e. powerloss, a spurious chunk reset, ..). Reported-by: Zhoujie Wu <zjwu@marvell.com> Tested-by: Zhoujie Wu <zjwu@marvell.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: Switch to use new generic UUID APIAndy Shevchenko2019-02-111-3/+5
| | | | | | | | | | | | There are new types and helpers that are supposed to be used in new code. As a preparation to get rid of legacy types and API functions do the conversion here. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: fix use-after-free bugGustavo A. R. Silva2018-12-221-1/+0
| | | | | | | | | | | | | Remove one of the calls to function bio_put(), so *bio* is only freed once. Notice that bio is being dereferenced in bio_put(), hence leading to a use-after-free bug once *bio* has already been freed. Addresses-Coverity-ID: 1475952 ("Use after free") Fixes: 55d8ec35398e ("lightnvm: pblk: support packed metadata") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: support packed metadataIgor Konopko2018-12-111-4/+13
| | | | | | | | | | | | | | | | | | | pblk performs recovery of open lines by storing the LBA in the per LBA metadata field. Recovery therefore only works for drives that has this field. This patch adds support for packed metadata, which store l2p mapping for open lines in last sector of every write unit and enables drives without per IO metadata to recover open lines. After this patch, drives with OOB size <16B will use packed metadata and metadata size larger than16B will continue to use the device per IO metadata. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: dynamic DMA pool entry sizeIgor Konopko2018-12-111-2/+2
| | | | | | | | | | | | | | | | | Currently lightnvm and pblk uses single DMA pool, for which the entry size always is equal to PAGE_SIZE. The contents of each entry allocated from the DMA pool consists of a PPA list (8bytes * 64), leaving 56bytes * 64 space for metadata. Since the metadata field can be bigger, such as 128 bytes, the static size does not cover this use-case. This patch adds support for I/O metadata above 56 bytes by changing DMA pool size based on device meta size and allows pblk to use OOB metadata >=16B. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: add helpers for OOB metadataIgor Konopko2018-12-111-6/+10
| | | | | | | | | | | | | pblk currently assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive in the future. After this patch only OOB metadata equal to 16 bytes is supported. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: add comments wrt locking in recovery pathJavier González2018-12-111-0/+3
| | | | | | | | | | pblk's recovery path is single threaded and therefore a number of assumptions regarding concurrency can be made. To avoid confusion, make this explicit with a couple of comments in the code. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: remove dead code in pblk_recov_l2pHans Holmberg2018-12-111-1/+0
| | | | | | | | | | Remove the call to pblk_line_replace_data as it returns directly because we have not set l_mg->data_next yet. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: ignore the smeta oob area scanZhoujie Wu2018-12-111-2/+3
| | | | | | | | | | | | The smeta area l2p mapping is empty, and actually the recovery procedure only need to restore data sector's l2p mapping. So ignore the smeta oob scan. Signed-off-by: Zhoujie Wu <zjwu@marvell.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: add SPDX license tagJavier González2018-10-091-0/+1
| | | | | | | | Add GLP-2.0 SPDX license tag to all pblk files Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: recover open lines on 2.0 devicesJavier González2018-10-091-279/+121
| | | | | | | | | | | | | | | | | | | In the OCSSD 2.0 spec, each chunk reports its write pointer. This means that pblk does not need to scan open lines to find the write pointer, but instead, it can retrieve it directly (and verify it). This patch uses the write pointer on open lines to (i) recover the line up until the last written lba and (ii) reconstruct the map bitmap and rest of line metadata so that the line can be used for new data. Since the 1.2 path in lightnvm core has been re-implemented to populate the chunk structure and thus recover the write pointer on initialization, this patch removes 1.2 specific recovery, as the 2.0 path can be reused. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: refactor metadata pathsJavier González2018-10-091-2/+2
| | | | | | | | | | | | | | | | | pblk maintains two different metadata paths for smeta and emeta, which store metadata at the start of the line and at the end of the line, respectively. Until now, these path has been common for writing and retrieving metadata, however, as these paths diverge, the common code becomes less clear and unnecessary complicated. In preparation for further changes to the metadata write path, this patch separates the write and read paths for smeta and emeta and removes the synchronous emeta path as it not used anymore (emeta is scheduled asynchronously to prevent jittering due to internal I/Os). Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: encapsulate rqd dma allocationsJavier González2018-10-091-20/+10
| | | | | | | | | | dma allocations for ppa_list and meta_list in rqd are replicated in several places across the pblk codebase. Make helpers to encapsulate creation and deletion to simplify the code. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: fix two sleep-in-atomic-context bugsJia-Ju Bai2018-10-091-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver may sleep with holding a spinlock. The function call paths (from bottom to top) in Linux-4.16 are: [FUNC] nvm_dev_dma_alloc(GFP_KERNEL) drivers/lightnvm/pblk-core.c, 754: nvm_dev_dma_alloc in pblk_line_submit_smeta_io drivers/lightnvm/pblk-core.c, 1048: pblk_line_submit_smeta_io in pblk_line_init_bb drivers/lightnvm/pblk-core.c, 1434: pblk_line_init_bb in pblk_line_replace_data drivers/lightnvm/pblk-recovery.c, 980: pblk_line_replace_data in pblk_recov_l2p drivers/lightnvm/pblk-recovery.c, 976: spin_lock in pblk_recov_l2p [FUNC] bio_map_kern(GFP_KERNEL) drivers/lightnvm/pblk-core.c, 762: bio_map_kern in pblk_line_submit_smeta_io drivers/lightnvm/pblk-core.c, 1048: pblk_line_submit_smeta_io in pblk_line_init_bb drivers/lightnvm/pblk-core.c, 1434: pblk_line_init_bb in pblk_line_replace_data drivers/lightnvm/pblk-recovery.c, 980: pblk_line_replace_data in pblk_recov_l2p drivers/lightnvm/pblk-recovery.c, 976: spin_lock in pblk_recov_l2p To fix these bugs, the call to pblk_line_replace_data() is moved out of the spinlock protection. These bugs are found by my static analysis tool DSAC. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: calculate line pad distance in helperJavier González2018-10-091-3/+10
| | | | | | | | | If a line is padded, calculate the pad distance directly on the helper being used for this purpose. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: add trace events for line state changesHans Holmberg2018-10-091-0/+6
| | | | | | | | Add trace events for logging for line state changes. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: remove debug from pblk_[down/up]_pageMatias Bjørling2018-10-091-3/+3
| | | | | | | | | | | | | | | | Remove the debug only iteration within __pblk_down_page, which then allows us to reduce the number of arguments down to pblk and the parallel unit from the functions that calls it. Simplifying the callers logic considerably. Also, rename the functions pblk_[down/up]_page to pblk_[down/up]_chunk, to communicate that it manages the write pointer of the chunk. Note that it also protects the parallel unit such that at most one chunk is active per parallel unit. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: allocate line map bitmaps using a mempoolHans Holmberg2018-10-091-1/+1
| | | | | | | | | | | | | | Line map bitmap allocations are fairly large and can fail. Allocation failures are fatal to pblk, stopping the write pipeline. To avoid this, allocate the bitmaps using a mempool instead. Mempool allocations never fail if called from a process context, and pblk *should* only allocate map bitmaps in process context, but keep the failure handling for robustness sake. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: introduce nvm_rq_to_ppa_listHans Holmberg2018-10-091-3/+6
| | | | | | | | | | | | | | | | | | There is a number of places in the lightnvm subsystem where the user iterates over the ppa list. Before iterating, the user must know if it is a single or multiple LBAs due to vector commands using either the nvm_rq ->ppa_addr or ->ppa_list fields on command submission, which leads to open-coding the if/else statement. Instead of having multiple if/else's, move it into a function that can be called by its users. A nice side effect of this cleanup is that this patch fixes up a bunch of cases where we don't consider the single-ppa case in pblk. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: combine 1.2 and 2.0 command flagsMatias Bjørling2018-10-091-10/+4
| | | | | | | | | | | | | | | Add nvm_set_flags helper to enable core to appropriately set the command flags for read/write/erase depending on which version a drive supports. The flags arguments can be distilled into the access hint, scrambling, and program/erase suspend. Replace the access hint with a "is_seq" parameter. The rest of the flags are dependent on the command opcode, which is trivial to detect and set. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: expose generic disk name on pr_* msgsMatias Bjørling2018-07-131-22/+22
| | | | | | | | | | | The error messages in pblk does not say which pblk instance that a message occurred from. Update each error message to reflect the instance it belongs to, and also prefix it with pblk, so we know the message comes from the pblk module. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: enable line minor version detectionMatias Bjørling2018-07-131-2/+3
| | | | | | | | | | | | | | When recovering a line, an extra check was added when debugging was active, such that minor version where also checked. Unfortunately, this used the ifdef NVM_DEBUG, which is not correct. Instead use the proper DEBUG def, and now that it compiles, also fix the variable. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Fixes: d0ab0b1ab991f ("lightnvm: pblk: check data lines version on recovery") Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* treewide: Use array_size() in vzalloc()Kees Cook2018-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vzalloc(a * b) with: vzalloc(array_size(a, b)) as well as handling cases of: vzalloc(a * b * c) with: vzalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vzalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(char) * COUNT + COUNT , ...) | vzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vzalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vzalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vzalloc(C1 * C2 * C3, ...) | vzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vzalloc(C1 * C2, ...) | vzalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
* lightnvm: pblk: only try to recover lines with written smetaHans Holmberg2018-06-011-9/+21
| | | | | | | | | | | | | When switching between different lun configurations, there is no guarantee that all lines that contain closed/open chunks have some valid data to recover. Check that the smeta chunk has been written to instead. Also skip bad lines (that does not have enough good chunks). Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: rework write error recovery pathHans Holmberg2018-06-011-91/+0
| | | | | | | | | | | | | | | | | | The write error recovery path is incomplete, so rework the write error recovery handling to do resubmits directly from the write buffer. When a write error occurs, the remaining sectors in the chunk are mapped out and invalidated and the request inserted in a resubmit list. The writer thread checks if there are any requests to resubmit, scans and invalidates any lbas that have been overwritten by later writes and resubmits the failed entries. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: convert to bioset_init()/mempool_init()Kent Overstreet2018-05-301-1/+1
| | | | | | | | Convert lightnvm to embedded bio sets. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: don't recover unwritten linesHans Holmberg2018-03-291-0/+18
| | | | | | | | | | | If the line has not been written to, we should not try to recover any data from it, so check the state of the chunks in the line before attempting to read smeta. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: simplify geometry structureJavier González2018-03-291-7/+7
| | | | | | | | | | | | | | | Currently, the device geometry is stored redundantly in the nvm_id and nvm_geo structures at a device level. Moreover, when instantiating targets on a specific number of LUNs, these structures are replicated and manually modified to fit the instance channel and LUN partitioning. Instead, create a generic geometry around nvm_geo, which can be used by (i) the underlying device to describe the geometry of the whole device, and (ii) instances to describe their geometry independently. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: remove nvm_dev_ops->max_phys_sectMatias Bjørling2018-03-291-6/+2
| | | | | | | | | The value of max_phys_sect is always static. Instead of defining it in the nvm_dev_ops structure, declare it as a global value. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: export write amplification counters to sysfsHans Holmberg2018-03-291-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a SSD, write amplification, WA, is defined as the average number of page writes per user page write. Write amplification negatively affects write performance and decreases the lifetime of the disk, so it's a useful metric to add to sysfs. In plkb's case, the number of writes per user sector is the sum of: (1) number of user writes (2) number of sectors written by the garbage collector (3) number of sectors padded (i.e. due to syncs) This patch adds persistent counters for 1-3 and two sysfs attributes to export these along with WA calculated with five decimals: write_amp_mileage: the accumulated write amplification stats for the lifetime of the pblk instance write_amp_trip: resetable stats to facilitate delta measurements, values reset at creation and if 0 is written to the attribute. 64-bit counters are used as a 32 bit counter would wrap around already after about 17 TB worth of user data. It will take a long long time before the 64 bit sector counters wrap around. The counters are stored after the bad block bitmap in the first emeta sector of each written line. There is plenty of space in the first emeta sector, so we don't need to bump the major version of the line data format. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: check data lines version on recoveryHans Holmberg2018-03-291-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As a preparation for future bumps of data line persistent storage versions, we need to start checking the emeta line version during recovery. Also slit up the current emeta/smeta version into two bytes (major,minor). Recovering lines with the same major number as the current pblk data line version must succeed. This means that any changes in the persistent format must be: (1) Backward compatible: if we switch back to and older kernel, recovery of lines stored with major == current_major and minor > current_minor must succeed. (2) Forward compatible: switching to a newer kernel, recovery of lines stored with major=current_major and minor < minor must handle the data format differences gracefully(i.e. initialize new data structures to default values). If we detect lines that have a different major number than the current we must abort recovery. The user must manually migrate the data in this case. Previously the version stored in the emeta header was copied from smeta, which has version 1, so we need to set the minor version to 1. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: ignore high ecc errors on recoveryJavier González2018-01-051-1/+1
| | | | | | | | | On recovery, do not stop L2P recovery if reads report high ECC error as the data is still available. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: use exact free block counter in RLJavier González2018-01-051-3/+1
| | | | | | | | | | | | | | Until now, pblk's rate-limiter has used a heuristic to reserve space for GC I/O given that the over-provision area was fixed. In preparation for allowing to define the over-provision area on target creation, define a dedicated free_block counter in the rate-limiter to track the number of blocks being used for user data. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: refactor emeta consistency checkHans Holmberg2018-01-051-5/+10
| | | | | | | | | | | | | | Currently pblk_recov_get_lba list does two separate things: it checks the consistency of the emeta and extracts the lba list. This patch separates the consistency check to make the code easier to read and to prepare for version checks of the line emeta persistent data format version. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: compress and reorder helper functionsJavier González2018-01-051-10/+10
| | | | | | | | | | Through time, we have generated some redundant helper functions. Refactor them to eliminate redundant and unnecessary code. Also, reorder them to improve readability Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: make geometry structures 2.0 readyMatias Bjørling2018-01-051-1/+1
| | | | | | | | | | Prepare for the 2.0 revision by adapting the geometry structures to coexist with the 1.2 revision. Signed-off-by: Matias Bjørling <m@bjorling.me> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: implement generic path for sync I/OJavier González2017-10-131-28/+3
| | | | | | | | | | Implement a generic path for sending sync I/O on LightNVM. This allows to reuse the standard synchronous path trough blk_execute_rq(), instead of implementing a wait_for_completion on the target side (e.g., pblk). Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: consider bad sectors in emeta during recoveryHans Holmberg2017-10-131-14/+30
| | | | | | | | | | | | | | | | | | | | | | When recovering lines we need to consider that bad blocks in a line affect the emeta area size. Previously it was assumed that the emeta area would grow by the number of sectors per page * number of bad blocks in the line. This assumption is not correct - the number of "extra" pages that are consumed could be both smaller (depending on emeta size) and bigger (depending on the placement of the bad blocks). Fix this by calculating the emeta start by iterating backwards through the line, skipping ppas that map to bad blocks. Also fix the data types used for ppa indices/counts in pblk_recov_l2p_from_emeta - we should use u64. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: recover partially written lines correctlyHans Holmberg2017-10-131-4/+4
| | | | | | | | | | | | | When recovering partially written lines, the valid sector count must be decreased by the number of padded sectors in the line. Update line recovery to take all ADDR_EMPTY(padded) sectors into account. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* lightnvm: pblk: refactor rqd alloc/freeJavier González2017-10-131-2/+0
| | | | | | | | | Refactor the rqd allocation and free functions so that all I/O types can use these helper functions. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>