summaryrefslogtreecommitdiffstats
path: root/drivers/md/dm-raid.c
Commit message (Collapse)AuthorAgeFilesLines
* dm: eliminate 'split_discard_bios' flag from DM target interfaceMike Snitzer2019-02-201-5/+9
| | | | | | | | | There is no need to have DM core split discards on behalf of a DM target now that blk_queue_split() handles splitting discards based on the queue_limits. A DM target just needs to set max_discard_sectors, discard_granularity, etc, in queue_limits. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: fix false -EBUSY when handling check/repair messageHeinz Mauelshagen2018-12-181-2/+1
| | | | | | | | | | | | Sending a check/repair message infrequently leads to -EBUSY instead of properly identifying an active resync. This occurs because raid_message() is testing recovery bits in a racy way. Fix by calling decipher_sync_action() from raid_message() to properly identify the idle state of the RAID device. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: avoid bitmap with raid4/5/6 journal deviceHeinz Mauelshagen2018-10-181-1/+1
| | | | | | | With raid4/5/6, journal device and write intent bitmap are mutually exclusive. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: remove bogus const from decipher_sync_action() return typeGeert Uytterhoeven2018-09-171-1/+1
| | | | | | | | | | | | | With gcc-4.1.2: drivers/md/dm-raid.c:3357: warning: type qualifiers ignored on function return type Remove the "const" keyword to fix this. Fixes: 36a240a706d43383 ("dm raid: fix RAID leg rebuild errors") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: bump target version, update comments and documentationHeinz Mauelshagen2018-09-061-4/+6
| | | | | | | | Bump target version to reflect the documented fixes are available. Also fix some code comments (typos and clarity). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: fix RAID leg rebuild errorsHeinz Mauelshagen2018-09-061-34/+46
| | | | | | | | | | | | | | | | | | On fast devices such as NVMe, a flaw in rs_get_progress() results in false target status output when userspace lvm2 requests leg rebuilds (symptom of the failure is device health chars 'aaaaaaaa' instead of expected 'aAaAAAAA' causing lvm2 to fail). The correct sync action state definitions already exist in decipher_sync_action() so fix rs_get_progress() to use it. Change decipher_sync_action() to return an enum rather than a string for the sync states and call it from rs_get_progress(). Introduce sync_str() to translate from enum to the string that is needed by raid_status(). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: fix rebuild of specific devices by updating superblockHeinz Mauelshagen2018-09-061-0/+5
| | | | | | | | | | Update superblock when particular devices are requested via rebuild (e.g. lvconvert --replace ...) to avoid spurious failure with the "New device injected into existing raid set without 'delta_disks' or 'rebuild' parameter specified" error message. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: fix stripe adding reshape deadlockHeinz Mauelshagen2018-09-061-8/+3
| | | | | | | | | | | | | | When initiating a stripe adding reshape, a deadlock between md_stop_writes() waiting for the sync thread to stop and the running sync thread waiting for inactive stripes occurs (this frequently happens on single-core but rarely on multi-core systems). Fix this deadlock by setting MD_RECOVERY_WAIT to have the main MD resynchronization thread worker (md_do_sync()) bail out when initiating the reshape via constructor arguments. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: fix reshape race on small devicesHeinz Mauelshagen2018-09-061-47/+1
| | | | | | | | | | | | | | | | | | | | | | | Loading a new mapping table, the dm-raid target's constructor retrieves the volatile reshaping state from the raid superblocks. When the new table is activated in a following resume, the actual reshape position is retrieved. The reshape driven by the previous mapping can already have finished on small and/or fast devices thus updating raid superblocks about the new raid layout. This causes the actual array state (e.g. stripe size reshape finished) to be inconsistent with the one in the new mapping, causing hangs with left behind devices. This race does not occur with usual raid device sizes but with small ones (e.g. those created by the lvm2 test suite). Fix by no longer transferring stale/inconsistent raid_set state during preresume. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2018-08-181-3/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - a new driver for Rohm BU21029 touch controller - new bitmap APIs: bitmap_alloc, bitmap_zalloc and bitmap_free - updates to Atmel, eeti. pxrc and iforce drivers - assorted driver cleanups and fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (57 commits) MAINTAINERS: Add PhoenixRC Flight Controller Adapter Input: do not use WARN() in input_alloc_absinfo() Input: mark expected switch fall-throughs Input: raydium_i2c_ts - use true and false for boolean values Input: evdev - switch to bitmap API Input: gpio-keys - switch to bitmap_zalloc() Input: elan_i2c_smbus - cast sizeof to int for comparison bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free() md: Avoid namespace collision with bitmap API dm: Avoid namespace collision with bitmap API Input: pm8941-pwrkey - add resin entry Input: pm8941-pwrkey - abstract register offsets and event code Input: iforce - reorganize joystick configuration lists Input: atmel_mxt_ts - move completion to after config crc is updated Input: atmel_mxt_ts - don't report zero pressure from T9 Input: atmel_mxt_ts - zero terminate config firmware file Input: atmel_mxt_ts - refactor config update code to add context struct Input: atmel_mxt_ts - config CRC may start at T71 Input: atmel_mxt_ts - remove unnecessary debug on ENOMEM Input: atmel_mxt_ts - remove duplicate setup of ABS_MT_PRESSURE ...
| * md: Avoid namespace collision with bitmap APIAndy Shevchenko2018-08-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | bitmap API (include/linux/bitmap.h) has 'bitmap' prefix for its methods. On the other hand MD bitmap API is special case. Adding 'md' prefix to it to avoid name space collision. No functional changes intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Shaohua Li <shli@kernel.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
* | dm raid: don't use 'const' in function returnArnd Bergmann2018-06-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A newly introduced function has 'const int' as the return type, but as "make W=1" reports, that has no meaning: drivers/md/dm-raid.c:510:18: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers] This changes the return type to plain 'int'. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 33e53f06850f ("dm raid: introduce extended superblock and new raid types to support takeover/reshaping") Signed-off-by: Mike Snitzer <snitzer@redhat.com> Fixes: 552aa679f2657431 ("dm raid: use rs_is_raid*()") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | treewide: Use struct_size() for kmalloc()-familyKees Cook2018-06-061-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This patch makes the changes for kmalloc()-family (and kvmalloc()-family) uses. It was done via automatic conversion with manual review for the "CHECKME" non-standard cases noted below, using the following Coccinelle script: // pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len * // sizeof *pkey_cache->table, GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // Same pattern, but can't trivially locate the trailing element name, // or variable name. @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; expression SOMETHING, COUNT, ELEMENT; @@ - alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP) + alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP) Signed-off-by: Kees Cook <keescook@chromium.org>
* dm raid: fix parse_raid_params() variable range issueHeinz Mauelshagen2018-04-041-8/+19
| | | | | | | | | | | | | | | | parse_raid_params() compares variable "int value" with INT_MAX. E.g. related Coverity report excerpt: CID 1364818 (#2 of 3): Operands don't affect result (CONSTANT_EXPRESSION_RESULT) [select issue] 1433 if (value > INT_MAX) { Fix by changing checks to avoid INT_MAX. Whilst on it, avoid unnecessary checks against constants and add check for sane recovery speed min/max. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: fix nosync statusHeinz Mauelshagen2018-04-031-1/+2
| | | | | | | | | | | Fix a race for "nosync" activations providing "aa.." device health characters and "0/N" sync ratio rather than "AA..." and "N/N". Occurs when status for the raid set is retrieved during resume before the MD sync thread starts and clears the MD_RECOVERY_NEEDED flag. Cc: stable@vger.kernel.org # 4.16+ Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: allow targets to return output from messages they are sentMike Snitzer2018-04-031-1/+2
| | | | | | | | Could be useful for a target to return stats or other information. If a target does DMEMIT() anything to @result from its .message method then it must return 1 to the caller. Signed-off-By: Mike Snitzer <snitzer@redhat.com>
* dm raid: fix incorrect sync_ratio when degradedJonathan Brassow2018-03-061-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit 4102d9de6d375 ("dm raid: fix rs_get_progress() synchronization state/ratio") in combination with commit 7c29744ecce ("dm raid: simplify rs_get_progress()") introduced a regression by incorrectly reporting a sync_ratio of 0 for degraded raid sets. This caused lvm2 to fail to repair raid legs automatically. Fix by identifying the degraded state by checking the MD_RECOVERY_INTR flag and returning mddev->recovery_cp in case it is set. MD sets recovery = [ MD_RECOVERY_RECOVER MD_RECOVERY_INTR MD_RECOVERY_NEEDED ] when a RAID member fails. It then shuts down any sync thread that is running and leaves us with all MD_RECOVERY_* flags cleared. The bug occurs if a status is requested in the short time it takes to shut down any sync thread and clear the flags, because we were keying in on the MD_RECOVERY_NEEDED - understanding it to be the initial phase of a “recover” sync thread. However, this is an incorrect interpretation if MD_RECOVERY_INTR is also set. This also explains why the bug only happened when automatic repair was enabled and not a normal ‘manual’ method. It is impossible to react quick enough to hit the problematic window without it being automated. Fix passes automatic repair tests. Fixes: 7c29744ecce ("dm raid: simplify rs_get_progress()") Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* Merge tag 'for-4.16/dm-changes' of ↵Linus Torvalds2018-01-311-126/+254
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - DM core fixes to ensure that bio submission follows a depth-first tree walk; this is critical to allow forward progress without the need to use the bioset's BIOSET_NEED_RESCUER. - Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure. - DM core cleanups and improvements to make bio-based DM more efficient (e.g. reduced memory footprint as well leveraging per-bio-data more). - Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages the more direct IO submission path in the block layer; this mode is used by DM multipath and also optimizes targets like DM thin-pool that stack directly on NVMe data device. - DM multipath improvements to factor out legacy SCSI-only (e.g. scsi_dh) code paths to allow for more optimized support for NVMe multipath. - A fix for DM multipath path selectors (service-time and queue-length) to select paths in a more balanced way; largely academic but doesn't hurt. - Numerous DM raid target fixes and improvements. - Add a new DM "unstriped" target that enables Intel to workaround firmware limitations in some NVMe drives that are striped internally (this target also works when stacked above the DM "striped" target). - Various Documentation fixes and improvements. - Misc cleanups and fixes across various DM infrastructure and targets (e.g. bufio, flakey, log-writes, snapshot). * tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (69 commits) dm cache: Documentation: update default migration_throttling value dm mpath selector: more evenly distribute ties dm unstripe: fix target length versus number of stripes size check dm thin: fix trailing semicolon in __remap_and_issue_shared_cell dm table: fix NVMe bio-based dm_table_determine_type() validation dm: various cleanups to md->queue initialization code dm mpath: delay the retry of a request if the target responded as busy dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIRED dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failure dm log writes: fix max length used for kstrndup dm: backfill missing calls to mutex_destroy() dm snapshot: use mutex instead of rw_semaphore dm flakey: check for null arg_name in parse_features() dm thin: extend thinpool status format string with omitted fields dm thin: fixes in thin-provisioning.txt dm thin: document representation of <highest mapped sector> when there is none dm thin: fix documentation relative to low water mark threshold dm cache: be consistent in specifying sectors and SI units in cache.txt dm cache: delete obsoleted paragraph in cache.txt dm cache: fix grammar in cache-policies.txt ...
| * dm raid: make raid_sets symbol staticWei Yongjun2018-01-171-1/+1
| | | | | | | | | | | | | | | | | | | | Fixes the following sparse warning: drivers/md/dm-raid.c:33:1: warning: symbol 'raid_sets' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: use rs_is_raid*()Heinz Mauelshagen2017-12-131-8/+8
| | | | | | | | | | | | | | Cleanup, no functional change. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: simplify rs_get_progress()Heinz Mauelshagen2017-12-131-20/+3
| | | | | | | | | | | | | | | | No need to calculate the reshaping progress because mddev->curr_resync_completed holds it. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: ensure 'a' chars during reshapeHeinz Mauelshagen2017-12-131-0/+9
| | | | | | | | | | | | | | During reshape, 'A' chars were reported in status rather than 'a'. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: stop keeping raid set frozen altogetherHeinz Mauelshagen2017-12-131-38/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to avoid redoing synchronization/recovery/reshape partially, the raid set got frozen until after all passed in table line flags had been cleared. The related table reload sequence had to be precisely followed, or reshaping may lead to data corruption caused by the active mapping carrying on with a reshape when the inactive mapping already had retrieved a stale reshape position. Harden by retrieving the actual resync/recovery/reshape position during resume whilst the active table is suspended thus avoiding to keep the raid set frozen altogether. This prevents superfluous redoing of an already resynchronized or recovered segment and, most importantly, potential for redoing of an already reshaped segment causing data corruption. Fixes: d39f0010e ("dm raid: fix raid_resume() to keep raid set frozen as needed") Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: validate current raid sets redundancyHeinz Mauelshagen2017-12-131-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Verifying the current raid sets redundancy based on retrieved superblock content has to use the superblock's raid level (e.g. raid0), not the constructor requested one (e.g. raid10). Using the requested raid level of raid10 lead to a "divide error" on raid0 which defines data copies divided by to be zero. Also check for bogus data copies. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: bump target version to reflect numerous fixesMike Snitzer2017-12-081-1/+1
| | | | | | | | | | | | Also update Documentation accordingly. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: small cleanup and remove unsed "struct raid_set" memberHeinz Mauelshagen2017-12-081-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | Move raid_resume()'s setting of 'rw' and 'in_sync' to just prior to mddev_resume(). Also, remove unused 'bitmap_loaded' member from "struct raid_set". No functional changes. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: fix rs_get_progress() synchronization state/ratioHeinz Mauelshagen2017-12-081-31/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix various sync state issues causing racy/bogus sync ratio, sync_action ad health chars in dm_status() info output. Sync ratio could be N/N (i.e. 100%) shortly after raid set creation, i.e. creating a new RaidLV or upconverting a linear LV to raid1 thus: "0 2097152 raid raid1 2 Aa 2097162/2097152 recover 0 0 -" instead of: "0 2097152 raid raid1 2 Aa 0/2097152 idle 0 0 -" Sync action could be non-idle, when the MD thread was done with io. Health chars could be 'A' when they should be 'a' for a short time before a resynchonization started. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: avoid passing array_in_sync variable to raid_status() calleesHeinz Mauelshagen2017-12-081-14/+16
| | | | | | | | | | | | | | | | | | | | The raid_status() function passes the bool array_in_sync variable around providing synchronization state of the MD array. Replace it with a runtime flag. This will avoid a pattern of having to pass discrete variables to various functions. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: display a consistent copy of the MD status via raid_status()Heinz Mauelshagen2017-12-081-16/+18
| | | | | | | | | | | | | | | | | | | | | | The MD sync thread updates recovery flags providing state of any running, idle, frozen, recovering, reshaping, ... activity it performs and updates respective flags asynchronously versus dm processing raid_status(). To close that race window, take a single copy of the flags and pass it into its callees. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: fix raid_resume() to keep raid set frozen as neededHeinz Mauelshagen2017-12-081-3/+9
| | | | | | | | | | | | | | | | | | | | | | During a reshape request: if userspace reloads a "raid" table multiple times, resulting in multiple superblock reads, the raid set needs to stay frozen until all config changes (chunk size, layout data_offset, delta_disks) have been stored in the superblocks and respective flags cleared. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: add component device size checks to avoid runtime failureHeinz Mauelshagen2017-12-081-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | Check all component data device sizes versus calculated size. Reject if device(s) are too small. Otherwise, MD will fail the operation by accessing beyond the end of the data device. An example use-case is that growing bitmap won't fit any more and the MD runtime will report an error when DM raid should catch this earlier. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: fix raid set size revalidationHeinz Mauelshagen2017-12-081-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The raid set size is being revalidated unconditionally before a reshaping conversion is started. MD requires the size to only be reduced in case of a stripe removing (i.e. shrinking) reshape but not when growing because the raid array has to stay small until after the growing reshape finishes. Fix by avoiding the size revalidation in preresume unless a shrinking reshape is requested. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: correct resizing state relative to reshape space in ctrHeinz Mauelshagen2017-12-081-4/+6
| | | | | | | | | | | | | | | | | | Pay attention to existing reshape space to define if a raid set needs resizing. Otherwise we can hit "Can't resize a reshaping raid set" when a reshape is being requested. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: consume sizes after md_finish_reshape() completes changing themHeinz Mauelshagen2017-12-081-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | The md raid personalities call md_finish_reshape() at the end of a reshape conversion which adjusts rdev->sectors. Correct/check rdev->sectors before initiating a reshape and raise the recovery pointer accordingly. Otherwise, the DM raid coordinated reshape will fail. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: fix deadlock caused by premature md_stop_writes()Heinz Mauelshagen2017-12-081-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | md_stop_writes() is called in raid_presuspend() causing deadlocks on bios submitted afterwards -- which happens on loaded raid sets with conversion requests. Fix by moving md_stop_writes() to raid_postsuspend(). NOTE: when the recovery's frozen (MD_RECOVERY_FROZEN), writes haven't been started (or are already stopped) so don't stop them again. Also remove superfluous readonly setting. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | md: introduce new personality funciton start()Song Liu2017-12-111-0/+9
|/ | | | | | | | | | | | | | | | | | | In do_md_run(), md threads should not wake up until the array is fully initialized in md_run(). However, in raid5_run(), raid5-cache may wake up mddev->thread to flush stripes that need to be written back. This design doesn't break badly right now. But it could lead to bad bug in the future. This patch tries to resolve this problem by splitting start up work into two personality functions, run() and start(). Tasks that do not require the md threads should go into run(), while task that require the md threads go into start(). r5l_load_log() is moved to raid5_start(), so it is not called until the md threads are started in do_md_run(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
* Merge tag 'for-4.15/dm-changes-2' of ↵Linus Torvalds2017-11-171-6/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull more device mapper updates from Mike Snitzer: "Given your expected travel I figured I'd get these fixes to you sooner rather than later. - a DM multipath stable@ fix to silence an annoying error message that isn't _really_ an error - a DM core @stable fix for discard support that was enabled for an entire DM device despite only having partial support for discards due to a mix of discard capabilities across the underlying devices. - a couple other DM core discard fixes. - a DM bufio @stable fix that resolves a 32-bit overflow" * tag 'for-4.15/dm-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm bufio: fix integer overflow when limiting maximum cache size dm: clear all discard attributes in queue_limits when discards are disabled dm: do not set 'discards_supported' in targets that do not need it dm: discard support requires all targets in a table support discards dm mpath: remove annoying message of 'blk_get_request() returned -11'
| * dm: do not set 'discards_supported' in targets that do not need itMike Snitzer2017-11-161-6/+0
| | | | | | | | | | | | | | | | The DM target's 'discards_supported' flag is intended to act as an override. Meaning, even if the underlying storage doesn't support discards the DM target will. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds2017-11-141-3/+9
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MD update from Shaohua Li: "This update mostly includes bug fixes: - md-cluster now supports raid10 from Guoqing - raid5 PPL fixes from Artur - badblock regression fix from Bo - suspend hang related fixes from Neil - raid5 reshape fixes from Neil - raid1 freeze deadlock fix from Nate - memleak fixes from Zdenek - bitmap related fixes from Me and Tao - other fixes and cleanups" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: (33 commits) md: free unused memory after bitmap resize md: release allocated bitset sync_set md/bitmap: clear BITMAP_WRITE_ERROR bit before writing it to sb md: be cautious about using ->curr_resync_completed for ->recovery_offset badblocks: fix wrong return value in badblocks_set if badblocks are disabled md: don't check MD_SB_CHANGE_CLEAN in md_allow_write md-cluster: update document for raid10 md: remove redundant variable q raid1: remove obsolete code in raid1_write_request md-cluster: Use a small window for raid10 resync md-cluster: Suspend writes in RAID10 if within range md-cluster/raid10: set "do_balance = 0" if area is resyncing md: use lockdep_assert_held raid1: prevent freeze_array/wait_all_barriers deadlock md: use TASK_IDLE instead of blocking signals md: remove special meaning of ->quiesce(.., 2) md: allow metadata update while suspending. md: use mddev_suspend/resume instead of ->quiesce() md: move suspend_hi/lo handling into core md code md: don't call bitmap_create() while array is quiesced. ...
| * md: always hold reconfig_mutex when calling mddev_suspend()NeilBrown2017-11-011-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most often mddev_suspend() is called with reconfig_mutex held. Make this a requirement in preparation a subsequent patch. Also require reconfig_mutex to be held for mddev_resume(), partly for symmetry and partly to guarantee no races with incr/decr of mddev->suspend. Taking the mutex in r5c_disable_writeback_async() is a little tricky as this is called from a work queue via log->disable_writeback_work, and flush_work() is called on that while holding ->reconfig_mutex. If the work item hasn't run before flush_work() is called, the work function will not be able to get the mutex. So we use mddev_trylock() inside the wait_event() call, and have that abort when conf->log is set to NULL, which happens before flush_work() is called. We wait in mddev->sb_wait and ensure this is woken when any of the conditions change. This requires waking mddev->sb_wait in mddev_unlock(). This is only like to trigger extra wake_ups of threads that needn't be woken when metadata is being written, and that doesn't happen often enough that the cost would be noticeable. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
| * md: rename some drivers/md/ files to have an "md-" prefixMike Snitzer2017-10-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Motivated by the desire to illiminate the imprecise nature of DM-specific patches being unnecessarily sent to both the MD maintainer and mailing-list. Which is born out of the fact that DM files also reside in drivers/md/ Now all MD-specific files in drivers/md/ start with either "raid" or "md-" and the MAINTAINERS file has been updated accordingly. Shaohua: don't change module name Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Shaohua Li <shli@fb.com>
* | dm raid: fix panic when attempting to force a raid to syncHeinz Mauelshagen2017-11-101-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Requesting a sync on an active raid device via a table reload (see 'sync' parameter in Documentation/device-mapper/dm-raid.txt) skips the super_load() call that defines the superblock size (rdev->sb_size) -- resulting in an oops if/when super_sync()->memset() is called. Fix by moving the initialization of the superblock start and size out of super_load() to the caller (analyse_superblocks). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | Merge tag 'for-4.14/dm-fixes' of ↵Linus Torvalds2017-10-051-5/+6
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - a stable fix for the alignment of the event number reported at the end of the 'DM_LIST_DEVICES' ioctl. - a couple stable fixes for the DM crypt target. - a DM raid health status reporting fix. * tag 'for-4.14/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm raid: fix incorrect status output at the end of a "recover" process dm crypt: reject sector_size feature if device length is not aligned to it dm crypt: fix memory leak in crypt_ctr_cipher_old() dm ioctl: fix alignment of event number in the device list
| * dm raid: fix incorrect status output at the end of a "recover" processJonathan Brassow2017-10-051-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are three important fields that indicate the overall health and status of an array: dev_health, sync_ratio, and sync_action. They tell us the condition of the devices in the array, and the degree to which the array is synchronized. This commit fixes a condition that is reported incorrectly. When a member of the array is being rebuilt or a new device is added, the "recover" process is used to synchronize it with the rest of the array. When the process is complete, but the sync thread hasn't yet been reaped, it is possible for the state of MD to be: mddev->recovery = [ MD_RECOVERY_RUNNING MD_RECOVERY_RECOVER MD_RECOVERY_DONE ] curr_resync_completed = <max dev size> (but not MaxSector) and all rdevs to be In_sync. This causes the 'array_in_sync' output parameter that is passed to rs_get_progress() to be computed incorrectly and reported as 'false' -- or not in-sync. This in turn causes the dev_health status characters to be reported as all 'a', rather than the proper 'A'. This can cause erroneous output for several seconds at a time when tools will want to be checking the condition due to events that are raised at the end of a sync process. Fix this by properly calculating the 'array_in_sync' return parameter in rs_get_progress(). Also, remove an unnecessary intermediate 'recovery_cp' variable in rs_get_progress(). Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | dm-raid: fix a race condition in request handlingShaohua Li2017-09-271-1/+1
|/ | | | | | | | | | | | raid_map calls pers->make_request, which missed the suspend check. Fix it with the new md_handle_request API. Fix: cc27b0c78c79(md: fix deadlock between mddev_suspend() and md_write_start()) Cc: Heinz Mauelshagen <heinzm@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
* dm raid: bump target versionHeinz Mauelshagen2017-07-251-1/+1
| | | | | | | | | | | Bumo dm-raid target version to 1.12.1 to reflect that commit cc27b0c78c ("md: fix deadlock between mddev_suspend() and md_write_start()") is available. This version change allows userspace to detect that MD fix is available. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: avoid mddev->suspended accessHeinz Mauelshagen2017-07-251-5/+7
| | | | | | | Use runtime flag to ensure that an mddev gets suspended/resumed just once. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: fix activation check in validate_raid_redundancy()Heinz Mauelshagen2017-07-251-5/+5
| | | | | | | | | | | | | | | | | | | | | | | During growing reshapes (i.e. stripes being added to a raid set), the new stripe images are not in-sync and not part of the raid set until the reshape is started. LVM2 has to request multiple table reloads involving superblock updates in order to reflect proper size of SubLVs in the cluster. Before a stripe adding reshape starts, validate_raid_redundancy() fails as a result of that because it checks the total number of devices against the number of rebuild ones rather than the actual ones in the raid set (as retrieved from the superblock) thus resulting in failed raid4/5/6/10 redundancy checks. E.g. convert 3 stripes -> 7 stripes raid5 (which only allows for maximum 1 device to fail) requesting +4 delta disks causing 4 devices to rebuild during reshaping thus failing activation. To fix this, move validate_raid_redundancy() to get access to the current raid_set members. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm raid: remove WARN_ON() in raid10_md_layout_to_format()Heinz Mauelshagen2017-07-251-2/+3
| | | | | Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* Merge tag 'for-4.13/dm-changes' of ↵Linus Torvalds2017-07-061-3/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Add the ability to use select or poll /dev/mapper/control to wait for events from multiple DM devices. - Convert DM's printk macros over to using pr_<level> macros. - Add a big-endian variant of plain64 IV to dm-crypt. - Add support for zoned (aka SMR) devices to DM core. DM kcopyd was also improved to provide a sequential write feature needed by zoned devices. - Introduce DM zoned target that provides support for host-managed zoned devices, the result dm-zoned device acts as a drive-managed interface to the underlying host-managed device. - A DM raid fix to avoid using BUG() for error handling. * tag 'for-4.13/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm zoned: fix overflow when converting zone ID to sectors dm raid: stop using BUG() in __rdev_sectors() dm zoned: drive-managed zoned block device target dm kcopyd: add sequential write feature dm linear: add support for zoned block devices dm flakey: add support for zoned block devices dm: introduce dm_remap_zone_report() dm: fix REQ_OP_ZONE_REPORT bio handling dm: fix REQ_OP_ZONE_RESET bio handling dm table: add zoned block devices validation dm: convert DM printk macros to pr_<level> macros dm crypt: add big-endian variant of plain64 IV dm bio prison: use rb_entry() rather than container_of() dm ioctl: report event number in DM_LIST_DEVICES dm ioctl: add a new DM_DEV_ARM_POLL ioctl dm: add basic support for using the select or poll function