summaryrefslogtreecommitdiffstats
path: root/drivers/md/dm-table.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-4.12/dm-changes' of ↵Linus Torvalds2017-05-031-38/+61
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - A major update for DM cache that reduces the latency for deciding whether blocks should migrate to/from the cache. The bio-prison-v2 interface supports this improvement by enabling direct dispatch of work to workqueues rather than having to delay the actual work dispatch to the DM cache core. So the dm-cache policies are much more nimble by being able to drive IO as they see fit. One immediate benefit from the improved latency is a cache that should be much more adaptive to changing workloads. - Add a new DM integrity target that emulates a block device that has additional per-sector tags that can be used for storing integrity information. - Add a new authenticated encryption feature to the DM crypt target that builds on the capabilities provided by the DM integrity target. - Add MD interface for switching the raid4/5/6 journal mode and update the DM raid target to use it to enable aid4/5/6 journal write-back support. - Switch the DM verity target over to using the asynchronous hash crypto API (this helps work better with architectures that have access to off-CPU algorithm providers, which should reduce CPU utilization). - Various request-based DM and DM multipath fixes and improvements from Bart and Christoph. - A DM thinp target fix for a bio structure leak that occurs for each discard IFF discard passdown is enabled. - A fix for a possible deadlock in DM bufio and a fix to re-check the new buffer allocation watermark in the face of competing admin changes to the 'max_cache_size_bytes' tunable. - A couple DM core cleanups. * tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits) dm bufio: check new buffer allocation watermark every 30 seconds dm bufio: avoid a possible ABBA deadlock dm mpath: make it easier to detect unintended I/O request flushes dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit() dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH dm: introduce enum dm_queue_mode to cleanup related code dm mpath: verify __pg_init_all_paths locking assumptions at runtime dm: verify suspend_locking assumptions at runtime dm block manager: remove an unused argument from dm_block_manager_create() dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue() dm mpath: delay requeuing while path initialization is in progress dm mpath: avoid that path removal can trigger an infinite loop dm mpath: split and rename activate_path() to prepare for its expanded use dm ioctl: prevent stack leak in dm ioctl call dm integrity: use previously calculated log2 of sectors_per_block dm integrity: use hex2bin instead of open-coded variant dm crypt: replace custom implementation of hex2bin() dm crypt: remove obsolete references to per-CPU state dm verity: switch to using asynchronous hash crypto API dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues ...
| * dm: introduce enum dm_queue_mode to cleanup related codeBart Van Assche2017-04-271-7/+7
| | | | | | | | | | | | | | | | | | Introduce an enumeration type for the queue mode. This patch does not change any functionality but makes the DM code easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm: verify suspend_locking assumptions at runtimeBart Van Assche2017-04-271-0/+4
| | | | | | | | | | | | | | | | | | Ensure that the assumptions about the caller holding suspend_lock are checked at runtime if lockdep is enabled. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm: mark targets that pass integrity dataMikulas Patocka2017-04-241-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A dm-crypt on dm-integrity device incorrectly advertises an integrity profile on the DM crypt device. It can be seen in the files "/sys/block/dm-*/integrity/*" that both dm-integrity and dm-crypt target advertise the integrity profile. That is incorrect, only the dm-integrity target should advertise the integrity profile. A general problem in DM is that if we have a DM device that depends on another device with an integrity profile, the upper device will always advertise the integrity profile, even when the target driver doesn't support handling integrity data. Most targets don't support integrity data, so we provide a whitelist of targets that support it (linear, delay and striped). The targets that support passing integrity data to the lower device are marked with the flag DM_TARGET_PASSES_INTEGRITY. The DM core will now advertise integrity data on a DM device only if all the targets support the integrity data. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm table: replace while loops with for loopsMikulas Patocka2017-04-241-31/+32
| | | | | | | | | | | | | | Also remove some unnecessary use of uninitialized_var(). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm table: add flag to allow target to handle its own integrity metadataMilan Broz2017-03-071-0/+11
| | | | | | | | | | | | | | | | Add DM_TARGET_INTEGRITY flag that specifies bio integrity metadata is not inherited but implemented in the target itself. Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | block: remove the discard_zeroes_data flagChristoph Hellwig2017-04-081-19/+0
| | | | | | | | | | | | | | | | | | | | Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can kill this hack. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | dm: support REQ_OP_WRITE_ZEROESChristoph Hellwig2017-04-081-0/+30
|/ | | | | | | | Copy & paste from the REQ_OP_WRITE_SAME code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* block: Use pointer to backing_dev_info from request_queueJan Kara2017-02-021-1/+1
| | | | | | | | | | | We will want to have struct backing_dev_info allocated separately from struct request_queue. As the first step add pointer to backing_dev_info to request_queue and convert all users touching it. No functional changes in this patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
* dm table: simplify dm_table_determine_type()Bart Van Assche2016-12-081-13/+8
| | | | | | | | Use a single loop instead of two loops to determine whether or not all_blk_mq has to be set. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm table: an 'all_blk_mq' table must be loaded for a blk-mq DM deviceBart Van Assche2016-12-081-0/+5
| | | | | | | | | | | | When dm_table_set_type() is used by a target to establish a DM table's type (e.g. DM_TYPE_MQ_REQUEST_BASED in the case of DM multipath) the DM core must go on to verify that the devices in the table are compatible with the established type. Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature") Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm table: fix 'all_blk_mq' inconsistency when an empty table is loadedMike Snitzer2016-12-081-6/+13
| | | | | | | | | | | | | | | | | An earlier DM multipath table could have been build ontop of underlying devices that were all using blk-mq. In that case, if that active multipath table is replaced with an empty DM multipath table (that reflects all paths have failed) then it is important that the 'all_blk_mq' state of the active table is transfered to the new empty DM table. Otherwise dm-rq.c:dm_old_prep_tio() will incorrectly clone a request that isn't needed by the DM multipath target when it is to issue IO to an underlying blk-mq device. Fixes: e83068a5 ("dm mpath: add optional "queue_mode" feature") Cc: stable@vger.kernel.org # 4.8+ Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm table: fix missing dm_put_target_type() in dm_table_add_target()tang.junhui2016-10-241-15/+9
| | | | | | | | | | | | | | | | | dm_get_target_type() was previously called so any error returned from dm_table_add_target() must first call dm_put_target_type(). Otherwise the DM target module's reference count will leak and the associated kernel module will be unable to be removed. Also, leverage the fact that r is already -EINVAL and remove an extra newline. Fixes: 36a0456 ("dm table: add immutable feature") Fixes: cc6cbe1 ("dm table: add always writeable feature") Fixes: 3791e2f ("dm table: add singleton feature") Cc: stable@vger.kernel.org # 3.2+ Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm error: add DAX supportMike Snitzer2016-07-201-1/+2
| | | | | | Allow the error target to replace an existing DAX-enabled target. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: add infrastructure for DAX supportToshi Kani2016-07-201-1/+43
| | | | | | | | | | | | | | | | | | | | | | | | | Change mapped device to implement direct_access function, dm_blk_direct_access(), which calls a target direct_access function. 'struct target_type' is extended to have target direct_access interface. This function limits direct accessible size to the dm_target's limit with max_io_len(). Add dm_table_supports_dax() to iterate all targets and associated block devices to check for DAX support. To add DAX support to a DM target the target must only implement the direct_access function. Add a new dm type, DM_TYPE_DAX_BIO_BASED, which indicates that mapped device supports DAX and is bio based. This new type is used to assure that all target devices have DAX support and remain that way after QUEUE_FLAG_DAX is set in mapped device. At initial table load, QUEUE_FLAG_DAX is set to mapped device when setting DM_TYPE_DAX_BIO_BASED to the type. Any subsequent table load to the mapped device must have the same type, or else it fails per the check in table_load(). Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm mpath: add optional "queue_mode" featureMike Snitzer2016-06-101-22/+45
| | | | | | | | | | | | | | | | | | | | Allow a user to specify an optional feature 'queue_mode <mode>' where <mode> may be "bio", "rq" or "mq" -- which corresponds to bio-based, request_fn rq-based, and blk-mq rq-based respectively. If the queue_mode feature isn't specified the default for the "multipath" target is still "rq" but if dm_mod.use_blk_mq is set to Y it'll default to mode "mq". This new queue_mode feature introduces the ability for each multipath device to have its own queue_mode (whereas before this feature all multipath devices effectively had to have the same queue_mode). This commit also goes a long way to eliminate the awkward (ab)use of DM_TYPE_*, the associated filter_md_type() and other relatively fragile and difficult to maintain code. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: move request-based code out to dm-rq.[hc]Mike Snitzer2016-06-101-1/+1
| | | | | | | | | | | | | | Add some seperation between bio-based and request-based DM core code. 'struct mapped_device' and other DM core only structures and functions have been moved to dm-core.h and all relevant DM core .c files have been updated to include dm-core.h rather than dm.h DM targets should _never_ include dm-core.h! [block core merge conflict resolution from Stephen Rothwell] Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
* block: kill off q->flush_flagsJens Axboe2016-04-131-6/+6
| | | | | | | | Now that we converted everything to the newer block write cache interface, kill off the queue flush_flags and queueable flush entries. Signed-off-by: Jens Axboe <axboe@fb.com>
* dm: switch to using blk_queue_write_cache()Jens Axboe2016-04-121-4/+4
| | | | | Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* dm snapshot: disallow the COW and origin devices from being identicalDingXiang2016-03-101-12/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise loading a "snapshot" table using the same device for the origin and COW devices, e.g.: echo "0 20971520 snapshot 253:3 253:3 P 8" | dmsetup create snap will trigger: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 [ 1958.979934] IP: [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot] [ 1958.989655] PGD 0 [ 1958.991903] Oops: 0000 [#1] SMP ... [ 1959.059647] CPU: 9 PID: 3556 Comm: dmsetup Tainted: G IO 4.5.0-rc5.snitm+ #150 ... [ 1959.083517] task: ffff8800b9660c80 ti: ffff88032a954000 task.ti: ffff88032a954000 [ 1959.091865] RIP: 0010:[<ffffffffa040efba>] [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot] [ 1959.104295] RSP: 0018:ffff88032a957b30 EFLAGS: 00010246 [ 1959.110219] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001 [ 1959.118180] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff880329334a00 [ 1959.126141] RBP: ffff88032a957b50 R08: 0000000000000000 R09: 0000000000000001 [ 1959.134102] R10: 000000000000000a R11: f000000000000000 R12: ffff880330884d80 [ 1959.142061] R13: 0000000000000008 R14: ffffc90001c13088 R15: ffff880330884d80 [ 1959.150021] FS: 00007f8926ba3840(0000) GS:ffff880333440000(0000) knlGS:0000000000000000 [ 1959.159047] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1959.165456] CR2: 0000000000000098 CR3: 000000032f48b000 CR4: 00000000000006e0 [ 1959.173415] Stack: [ 1959.175656] ffffc90001c13040 ffff880329334a00 ffff880330884ed0 ffff88032a957bdc [ 1959.183946] ffff88032a957bb8 ffffffffa040f225 ffff880329334a30 ffff880300000000 [ 1959.192233] ffffffffa04133e0 ffff880329334b30 0000000830884d58 00000000569c58cf [ 1959.200521] Call Trace: [ 1959.203248] [<ffffffffa040f225>] dm_exception_store_create+0x1d5/0x240 [dm_snapshot] [ 1959.211986] [<ffffffffa040d310>] snapshot_ctr+0x140/0x630 [dm_snapshot] [ 1959.219469] [<ffffffffa0005c44>] ? dm_split_args+0x64/0x150 [dm_mod] [ 1959.226656] [<ffffffffa0005ea7>] dm_table_add_target+0x177/0x440 [dm_mod] [ 1959.234328] [<ffffffffa0009203>] table_load+0x143/0x370 [dm_mod] [ 1959.241129] [<ffffffffa00090c0>] ? retrieve_status+0x1b0/0x1b0 [dm_mod] [ 1959.248607] [<ffffffffa0009e35>] ctl_ioctl+0x255/0x4d0 [dm_mod] [ 1959.255307] [<ffffffff813304e2>] ? memzero_explicit+0x12/0x20 [ 1959.261816] [<ffffffffa000a0c3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 1959.268615] [<ffffffff81215eb6>] do_vfs_ioctl+0xa6/0x5c0 [ 1959.274637] [<ffffffff81120d2f>] ? __audit_syscall_entry+0xaf/0x100 [ 1959.281726] [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70 [ 1959.288814] [<ffffffff81216449>] SyS_ioctl+0x79/0x90 [ 1959.294450] [<ffffffff8167e4ae>] entry_SYSCALL_64_fastpath+0x12/0x71 ... [ 1959.323277] RIP [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot] [ 1959.333090] RSP <ffff88032a957b30> [ 1959.336978] CR2: 0000000000000098 [ 1959.344121] ---[ end trace b049991ccad1169e ]--- Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1195899 Cc: stable@vger.kernel.org Signed-off-by: Ding Xiang <dingxiang@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: rename target's per_bio_data_size to per_io_data_sizeMike Snitzer2016-02-221-3/+3
| | | | | | Request-based DM will also make use of per_bio_data_size. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: optimize dm_mq_queue_rq()Mike Snitzer2016-02-221-0/+10
| | | | | | | | | | | | | | DM multipath is the only dm-mq target. But that aside, request-based DM only supports tables with a single target that is immutable. Leverage this fact in dm_mq_queue_rq() by using the 'immutable_target' stored in the mapped_device when the table was made active. This saves the need to even take the read-side of the SRCU via dm_{get,put}_live_table. If the active DM table does not have an immutable target (e.g. "error" target was swapped in) then fallback to the slow-path where the target is looked up from the live table. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: set DM_TARGET_WILDCARD feature on "error" targetMike Snitzer2016-02-221-0/+14
| | | | | | | | | | | | | The DM_TARGET_WILDCARD feature indicates that the "error" target may replace any target; even immutable targets. This feature will be useful to preserve the ability to replace the "multipath" target even once it is formally converted over to having the DM_TARGET_IMMUTABLE feature. Also, implicit in the DM_TARGET_WILDCARD feature flag being set is that .map, .map_rq, .clone_and_map_rq and .release_clone_rq are all defined in the target_type. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* block: Inline blk_integrity in struct gendiskMartin K. Petersen2015-10-211-41/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up until now the_integrity profile has been dynamically allocated and attached to struct gendisk after the disk has been made active. This causes problems because NVMe devices need to register the profile prior to the partition table being read due to a mandatory metadata buffer requirement. In addition, DM goes through hoops to deal with preallocating, but not initializing integrity profiles. Since the integrity profile is small (4 bytes + a pointer), Christoph suggested moving it to struct gendisk proper. This requires several changes: - Moving the blk_integrity definition to genhd.h. - Inlining blk_integrity in struct gendisk. - Removing the dynamic allocation code. - Adding helper functions which allow gendisk to set up and tear down the integrity sysfs dir when a disk is added/deleted. - Adding a blk_integrity_revalidate() callback for updating the stable pages bdi setting. - The calls that depend on whether a device has an integrity profile or not now key off of the bi->profile pointer. - Simplifying the integrity support routines in DM (Mike Snitzer). Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* block: Replace SG_GAPS with new queue limits maskKeith Busch2015-08-191-13/+0
| | | | | | | | | | | | | | | | | | The SG_GAPS queue flag caused checks for bio vector alignment against PAGE_SIZE, but the device may have different constraints. This patch adds a queue limits so a driver with such constraints can set to allow requests that would have been unnecessarily split. The new gaps check takes the request_queue as a parameter to simplify the logic around invoking this function. This new limit makes the queue flag redundant, so removing it and all usage. Device-mappers will inherit the correct settings through blk_stack_limits(). Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* block: kill merge_bvec_fn() completelyKent Overstreet2015-08-131-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* Revert "block, dm: don't copy bios for request clones"Mike Snitzer2015-06-261-16/+9
| | | | | | | | | | | | | | | | | | | | | | | This reverts commit 5f1b670d0bef508a5554d92525f5f6d00d640b38. Justification for revert as reported in this dm-devel post: https://www.redhat.com/archives/dm-devel/2015-June/msg00160.html this change should not be pushed to mainline yet. Firstly, Christoph has a newer version of the patch that fixes silent data corruption problem: https://www.redhat.com/archives/dm-devel/2015-May/msg00229.html And the new version still depends on LLDDs to always complete requests to the end when error happens, while block API doesn't enforce such a requirement. If the assumption is ever broken, the inconsistency between request and bio (e.g. rq->__sector and rq->bio) will cause silent data corruption: https://www.redhat.com/archives/dm-devel/2015-June/msg00022.html Reported-by: Junichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* Revert "dm: do not allocate any mempools for blk-mq request-based DM"Mike Snitzer2015-06-261-2/+2
| | | | | | | This reverts commit cbc4e3c1350beb47beab8f34ad9be3d34a20c705. Reported-by: Junichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: do not allocate any mempools for blk-mq request-based DMMike Snitzer2015-05-291-2/+2
| | | | | | | | | | | Do not allocate the io_pool mempool for blk-mq request-based DM (DM_TYPE_MQ_REQUEST_BASED) in dm_alloc_rq_mempools(). Also refine __bind_mempools() to have more precise awareness of which mempools each type of DM device uses -- avoids mempool churn when reloading DM tables (particularly for DM_TYPE_REQUEST_BASED). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* Merge remote-tracking branch 'jens/for-4.2/core' into dm-4.2Mike Snitzer2015-05-291-9/+16
|\
| * block, dm: don't copy bios for request clonesChristoph Hellwig2015-05-221-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently dm-multipath has to clone the bios for every request sent to the lower devices, which wastes cpu cycles and ties down memory. This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio to not complete bios attached to a request, which we set on clone requests similar to bios in a flush sequence. With this change I/O errors on a path failure only get propagated to dm-multipath, which can then either resubmit the I/O or complete the bios on the original request. I've done some basic testing of this on a Linux target with ALUA support, and it survives path failures during I/O nicely. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | dm: fix reload failure of 0 path multipath mapping on blk-mq devicesJunichi Nomura2015-05-291-7/+9
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dm-multipath accepts 0 path mapping. # echo '0 2097152 multipath 0 0 0 0' | dmsetup create newdev Such a mapping can be used to release underlying devices while still holding requests in its queue until working paths come back. However, once the multipath device is created over blk-mq devices, it rejects reloading of 0 path mapping: # echo '0 2097152 multipath 0 0 1 1 queue-length 0 1 1 /dev/sda 1' \ | dmsetup create mpath1 # echo '0 2097152 multipath 0 0 0 0' | dmsetup load mpath1 device-mapper: reload ioctl on mpath1 failed: Invalid argument Command failed With following kernel message: device-mapper: ioctl: can't change device type after initial table load. DM tries to inherit the current table type using dm_table_set_type() but it doesn't work as expected because of unnecessary check about whether the target type is hybrid or not. Hybrid type is for targets that work as either request-based or bio-based and not required for blk-mq or non blk-mq checking. Fixes: 65803c205983 ("dm table: train hybrid target type detection to select blk-mq if appropriate") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm table: use bool function return values of true/false not 1/0Joe Perches2015-04-151-10/+10
| | | | | | | Use the normal return values for bool functions. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm table: fall back to getting device using name_to_dev_t()Dan Ehrenberg2015-04-151-12/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a device is used as the root filesystem, it can't be built off of devices which are within the root filesystem (just like command line arguments to root=). For this reason, Linux has a pseudo-filesystem for root= and MD initialization (based on the function name_to_dev_t) which handles different ways of specifying devices including PARTUUID and major:minor. Switch to using name_to_dev_t() in dm_get_device(). Rather than having DM assume that all things which are not major:minor are paths in an already-mounted filesystem, change dm_get_device() to first attempt to look up the device in the filesystem, and if not found it will fall back to using name_to_dev_t(). In terms of backwards compatibility, there are some cases where behavior will be different: - If you have a file in the current working directory named 1:2 and you initialze DM there, then it will try to use that file rather than the disk with that major:minor pair as a backing device. - Similarly for other bdev types which name_to_dev_t() knows how to interpret, the previous behavior was to repeatedly check for the existence of the file (e.g., while waiting for rootfs to come up) but the new behavior is to use the name_to_dev_t() interpretation. For example, if you have a file named /dev/ubiblock0_0 which is a symlink to /dev/sda3, but it is not yet present when DM starts to initialize, then the name_to_dev_t() interpretation will take precedence. These incompatibilities would only show up in really strange setups with bad practices so we shouldn't have to worry about them. Signed-off-by: Dan Ehrenberg <dehrenberg@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attrMike Snitzer2015-04-151-3/+3
| | | | | | | | | | | | | Request-based DM's blk-mq support defaults to off; but a user can easily change the default using the dm_mod.use_blk_mq module/boot option. Also, you can check what mode a given request-based DM device is using with: cat /sys/block/dm-X/dm/use_blk_mq This change enabled further cleanup and reduced work (e.g. the md->io_pool and md->rq_pool isn't created if using blk-mq). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: add full blk-mq support to request-based DMMike Snitzer2015-04-151-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | Commit e5863d9ad ("dm: allocate requests in target when stacking on blk-mq devices") served as the first step toward fully utilizing blk-mq in request-based DM -- it enabled stacking an old-style (request_fn) request_queue ontop of the underlying blk-mq device(s). That first step didn't improve performance of DM multipath ontop of fast blk-mq devices (e.g. NVMe) because the top-level old-style request_queue was severely limited by the queue_lock. The second step offered here enables stacking a blk-mq request_queue ontop of the underlying blk-mq device(s). This unlocks significant performance gains on fast blk-mq devices, Keith Busch tested on his NVMe testbed and offered this really positive news: "Just providing a performance update. All my fio tests are getting roughly equal performance whether accessed through the raw block device or the multipath device mapper (~470k IOPS). I could only push ~20% of the raw iops through dm before this conversion, so this latest tree is looking really solid from a performance standpoint." Signed-off-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Keith Busch <keith.busch@intel.com>
* dm: remove request-based DM queue's lld_busy_fn hookMike Snitzer2015-03-311-14/+0
| | | | | | | | | | | DM multipath is the only caller of blk_lld_busy() -- which calls a queue's lld_busy_fn hook. Request-based DM doesn't support stacking multipath devices so there is no reason to register the lld_busy_fn hook on a multipath device's queue using blk_queue_lld_busy(). As such, remove functions dm_lld_busy and dm_table_any_busy_target. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: inherit QUEUE_FLAG_SG_GAPS flags from underlying queuesKeith Busch2015-02-111-0/+13
| | | | | | | | | | | A DM device must inherit the QUEUE_FLAG_SG_GAPS flags from its underlying block devices' request queues. This fixes problems when submitting cloned requests to multipathed devices requiring virtually contiguous buffers. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm table: train hybrid target type detection to select blk-mq if appropriateMike Snitzer2015-02-091-15/+20
| | | | | | | | | | | Otherwise replacing the multipath target with the error target fails: device-mapper: ioctl: can't change device type after initial table load. The error target was mistakenly considered to be target type DM_TYPE_REQUEST_BASED rather than DM_TYPE_MQ_REQUEST_BASED even if the target it was to replace was of type DM_TYPE_MQ_REQUEST_BASED. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: allocate requests in target when stacking on blk-mq devicesMike Snitzer2015-02-091-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For blk-mq request-based DM the responsibility of allocating a cloned request is transfered from DM core to the target type. Doing so enables the cloned request to be allocated from the appropriate blk-mq request_queue's pool (only the DM target, e.g. multipath, can know which block device to send a given cloned request to). Care was taken to preserve compatibility with old-style block request completion that requires request-based DM _not_ acquire the clone request's queue lock in the completion path. As such, there are now 2 different request-based DM target_type interfaces: 1) the original .map_rq() interface will continue to be used for non-blk-mq devices -- the preallocated clone request is passed in from DM core. 2) a new .clone_and_map_rq() and .release_clone_rq() will be used for blk-mq devices -- blk_get_request() and blk_put_request() are used respectively from these hooks. dm_table_set_type() was updated to detect if the request-based target is being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set. DM core disallows switching the DM table's type after it is set. This means that there is no mixing of non-blk-mq and blk-mq devices within the same request-based DM table. [This patch was started by Keith and later heavily modified by Mike] Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: add presuspend_undo hook to target_typeMike Snitzer2014-11-191-7/+29
| | | | | | | | The DM thin-pool target now must undo the changes performed during pool_presuspend() so introduce presuspend_undo hook in target_type. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
* dm: allow active and inactive tables to share dm_devsBenjamin Marzinski2014-10-051-68/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until this change, when loading a new DM table, DM core would re-open all of the devices in the DM table. Now, DM core will avoid redundant device opens (and closes when destroying the old table) if the old table already has a device open using the same mode. This is achieved by managing reference counts on the table_devices that DM core now stores in the mapped_device structure (rather than in the dm_table structure). So a mapped_device's active and inactive dm_tables' dm_dev lists now just point to the dm_devs stored in the mapped_device's table_devices list. This improvement in DM core's device reference counting has the side-effect of fixing a long-standing limitation of the multipath target: a DM multipath table couldn't include any paths that were unusable (failed). For example: if all paths have failed and you add a new, working, path to the table; you can't use it since the table load would fail due to it still containing failed paths. Now a re-load of a multipath table can include failed devices and when those devices become active again they can be used instantly. The device list code in dm.c isn't a straight copy/paste from the code in dm-table.c, but it's very close (aside from some variable renames). One subtle difference is that find_table_device for the tables_devices list will only match devices with the same name and mode. This is because we don't want to upgrade a device's mode in the active table when an inactive table is loaded. Access to the mapped_device structure's tables_devices list requires a mutex (tables_devices_lock), so that tables cannot be created and destroyed concurrently. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm table: propagate QUEUE_FLAG_NO_SG_MERGEJeff Moyer2014-08-101-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 05f1dd5 ("block: add queue flag for disabling SG merging") introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE. This gets set by default in blk_mq_init_queue for mq-enabled devices. The effect of the flag is to bypass the SG segment merging. Instead, the bio->bi_vcnt is used as the number of hardware segments. With a device mapper target on top of a device with QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments than a driver is prepared to handle. I ran into this when backporting the virtio_blk mq support. It triggerred this BUG_ON, in virtio_queue_rq: BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems); The queue's max is set here: blk_queue_max_segments(q, vblk->sg_elems-2); Basically, what happens is that a bio is built up for the dm device (which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using bio_add_page. That path will call into __blk_recalc_rq_segments, so what you end up with is bi_phys_segments being much smaller than bi_vcnt (and bi_vcnt grows beyond the maximum sg elements). Then, when the bio is submitted, it gets cloned. When the cloned bio is submitted, it will end up in blk_recount_segments, here: if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags)) bio->bi_phys_segments = bio->bi_vcnt; and now we've set bio->bi_phys_segments to a number that is beyond what was registered as queue_max_segments by the driver. The right way to fix this is to propagate the queue flag up the stack. The rules for propagating the flag are simple: - if the flag is set for any underlying device, it must be set for the upper device - consequently, if the flag is not set for any underlying device, it should not be set for the upper device. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.16+
* dm table: make dm_table_supports_discards staticMikulas Patocka2014-08-011-36/+37
| | | | | | | | | The function dm_table_supports_discards is only called from dm-table.c:dm_table_set_restrictions(). So move it above dm_table_set_restrictions and make it static. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: remove symbol export for dm_set_device_limitsMike Snitzer2014-06-041-3/+2
| | | | | | | There is no need for code other than DM core to use dm_set_device_limits so remove its EXPORT_SYMBOL_GPL. Also, cleanup a couple whitespace nits. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm table: add dm_table_run_md_queue_asyncMike Snitzer2014-03-271-0/+19
| | | | | | | | | | | | Introduce dm_table_run_md_queue_async() to run the request_queue of the mapped_device associated with a request-based DM table. Also add dm_md_get_queue() wrapper to extract the request_queue from a mapped_device. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
* dm: make dm_table_alloc_md_mempools staticMikulas Patocka2014-03-271-1/+1
| | | | | | | | Make the function dm_table_alloc_md_mempools static because it is not called from another file. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm table: remove unused buggy code that extends the targets arrayMikulas Patocka2014-01-071-20/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A device mapper table is allocated in the following way: * The function dm_table_create is called, it gets the number of targets as an argument -- it allocates a targets array accordingly. * For each target, we call dm_table_add_target. If we add more targets than were specified in dm_table_create, the function dm_table_add_target reallocates the targets array. However, this reallocation code is wrong - it moves the targets array to a new location, while some target constructors hold pointers to the array in the old location. The following DM target drivers save the pointer to the target structure, so they corrupt memory if the target array is moved: multipath, raid, mirror, snapshot, stripe, switch, thin, verity. Under normal circumstances, the reallocation function is not called (because dm_table_create is called with the correct number of targets), so the buggy reallocation code is not used. Prior to the fix "dm table: fail dm_table_create on dm_round_up overflow", the reallocation code could only be used in case the user specifies too large a value in param->target_count, such as 0xffffffff. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm table: fail dm_table_create on dm_round_up overflowMikulas Patocka2013-12-101-0/+5
| | | | | | | | | | | | | The dm_round_up function may overflow to zero. In this case, dm_table_create() must fail rather than go on to allocate an empty array with alloc_targets(). This fixes a possible memory corruption that could be caused by passing too large a number in "param->target_count". Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
* dm table: print error on preresume failureMike Snitzer2013-11-091-1/+4
| | | | | | | | | | | | | If preresume fails it is worth logging an error given that a device is left suspended due to the failure. This change was motivated by local preresume error logging that was added to the cache target ("preresume failed"). Elevating this target-agnostic context for the where the target-specific error occurred relative to the DM core's callouts makes sense. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com>