summaryrefslogtreecommitdiffstats
path: root/arch/um/drivers/ubd_kern.c
Commit message (Collapse)AuthorAgeFilesLines
* block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig2023-06-121-4/+4
| | | | | | | | | | | | | | The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* ubd: remove commented out code in ubd_openChristoph Hellwig2023-06-121-7/+0
| | | | | | | | | | | | This code has been dead forever, make sure it doesn't show up in code searches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Richard Weinberger <richard@nod.at> Link: https://lore.kernel.org/r/20230608110258.189493-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove the unused mode argument to ->releaseChristoph Hellwig2023-06-121-2/+2
| | | | | | | | | | | | The mode argument to the ->release block_device_operation is never used, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: pass a gendisk to ->openChristoph Hellwig2023-06-121-3/+2
| | | | | | | | | | | | ->open is only called on the whole device. Make that explicit by passing a gendisk instead of the block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* um: Do not initialise statics to 0.Xin Gao2022-09-191-1/+1
| | | | | | | do not initialise statics to 0. Signed-off-by: Xin Gao <gaoxin@cdjrlc.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: Use enum req_op where appropriateBart Van Assche2022-07-141-2/+2
| | | | | | | | | | | Improve static type checking by using type enum req_op instead of int where appropriate. Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-21-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove blk_cleanup_diskChristoph Hellwig2022-06-281-2/+2
| | | | | | | | | | blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* ubd: don't set the discard_alignment queue limitChristoph Hellwig2022-05-031-1/+0
| | | | | | | | | | | | The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by ubd is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove QUEUE_FLAG_DISCARDChristoph Hellwig2022-04-171-2/+0
| | | | | | | | | | | | | | | | | | | Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* um: Fix WRITE_ZEROES in the UBD DriverFrédéric Danis2022-03-111-1/+7
| | | | | | | | | | Call to fallocate with FALLOC_FL_PUNCH_HOLE on a device backed by a sparse file can end up by missing data, zeroes data range, if the underlying file is used with a tool like bmaptool which will referenced only used spaces. Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um/drivers/ubd_kern: add error handling support for add_disk()Luis Chamberlain2021-10-211-4/+9
| | | | | | | | | | | | | | We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. ubd_disk_register() never returned an error, so just fix that now and let the caller handle the error condition. Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211015233028.2167651-8-mcgrof@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: drop unused includes in <linux/genhd.h>Christoph Hellwig2021-10-181-0/+1
| | | | | | | | | | Drop various include not actually used in genhd.h itself, and move the remaning includes closer together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* ubd: use bvec_virtChristoph Hellwig2021-08-161-2/+1
| | | | | | | | | Use bvec_virt instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20210804095634.460779-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-blockLinus Torvalds2021-07-091-134/+26
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block updates from Jens Axboe: "A combination of changes that ended up depending on both the driver and core branch (and/or the IDE removal), and a few late arriving fixes. In detail: - Fix io ticks wrap-around issue (Chunguang) - nvme-tcp sock locking fix (Maurizio) - s390-dasd fixes (Kees, Christoph) - blk_execute_rq polling support (Keith) - blk-cgroup RCU iteration fix (Yu) - nbd backend ID addition (Prasanna) - Partition deletion fix (Yufen) - Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph) - Removal of now dead block request types due to IDE removal (Christoph) - Loop probing and control device cleanups (Christoph) - Device uevent fix (Christoph) - Misc cleanups/fixes (Tetsuo, Christoph)" * tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits) blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs block: fix the problem of io_ticks becoming smaller nvme-tcp: can't set sk_user_data without write_lock loop: remove unused variable in loop_set_status() block: remove the bdgrab in blk_drop_partitions block: grab a device refcount in disk_uevent s390/dasd: Avoid field over-reading memcpy() dasd: unexport dasd_set_target_state block: check disk exist before trying to add partition ubd: remove dead code in ubd_setup_common nvme: use return value from blk_execute_rq() block: return errors from blk_execute_rq() nvme: use blk_execute_rq() for passthrough commands block: support polling through blk_execute_rq block: remove REQ_OP_SCSI_{IN,OUT} block: mark blk_mq_init_queue_data static loop: rewrite loop_exit using idr_for_each_entry loop: split loop_lookup loop: don't allow deleting an unspecified loop device loop: move loop_ctl_mutex locking into loop_add ...
| * ubd: remove dead code in ubd_setup_commonChristoph Hellwig2021-06-301-10/+0
| | | | | | | | | | | | | | | | | | | | Remove some leftovers of the fake major number parsing that cause complains from some compilers. Fixes: 2933a1b2c6f3 ("ubd: remove the code to register as the legacy IDE driver") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210628093937.1325608-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * ubd: use blk_mq_alloc_disk and blk_cleanup_diskChristoph Hellwig2021-06-301-30/+14
| | | | | | | | | | | | | | | | | | Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210614060759.3965724-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * ubd: remove the code to register as the legacy IDE driverChristoph Hellwig2021-06-301-94/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | With the legacy IDE driver long deprecated, and modern userspace being much more flexible about dev_t assignments there is no reason to fake a registration as the legacy IDE driver in ubd. This registeration is a little problematic as it registers the same request_queue for multiple gendisks, so just remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20210614060759.3965724-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | um: Fix stack pointer alignmentYiFei Zhu2021-06-171-2/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC assumes that stack is aligned to 16-byte on call sites [1]. Since GCC 8, GCC began using 16-byte aligned SSE instructions to implement assignments to structs on stack. When CC_OPTIMIZE_FOR_PERFORMANCE is enabled, this affects os-Linux/sigio.c, write_sigio_thread: struct pollfds *fds, tmp; tmp = current_poll; Note that struct pollfds is exactly 16 bytes in size. GCC 8+ generates assembly similar to: movdqa (%rdi),%xmm0 movaps %xmm0,-0x50(%rbp) This is an issue, because movaps will #GP if -0x50(%rbp) is not aligned to 16 bytes [2], and how rbp gets assigned to is via glibc clone thread_start, then function prologue, going though execution trace similar to (showing only relevant instructions): sub $0x10,%rsi mov %rcx,0x8(%rsi) mov %rdi,(%rsi) syscall pop %rax pop %rdi callq *%rax push %rbp mov %rsp,%rbp The stack pointer always points to the topmost element on stack, rather then the space right above the topmost. On push, the pointer decrements first before writing to the memory pointed to by it. Therefore, there is no need to have the stack pointer pointer always point to valid memory unless the stack is poped; so the `- sizeof(void *)` in the code is unnecessary. On the other hand, glibc reserves the 16 bytes it needs on stack and pops itself, so by the call instruction the stack pointer is exactly the caller-supplied sp. It then push the 16 bytes of the return address and the saved stack pointer, so the base pointer will be 16-byte aligned if and only if the caller supplied sp is 16-byte aligned. Therefore, the caller must supply a 16-byte aligned pointer, which `stack + UM_KERN_PAGE_SIZE` already satisfies. On a side note, musl is unaffected by this issue because it forces 16 byte alignment via `and $-16,%rsi` in its clone wrapper. Similarly, glibc i386 is also unaffected because it has `andl $0xfffffff0, %ecx`. To reproduce this bug, enable CONFIG_UML_RTC and CC_OPTIMIZE_FOR_PERFORMANCE. uml_rtc will call add_sigio_fd which will then cause write_sigio_thread to either go into segfault loop or panic with "Segfault with no mm". Similarly, signal stacks will be aligned by the host kernel upon signal delivery. `- sizeof(void *)` to sigaltstack is unconventional and extraneous. On a related note, initialization of longjmp buffers do require `- sizeof(void *)`. This is to account for the return address that would have been pushed to the stack at the call site. The reason for uml to respect 16-byte alignment, rather than telling GCC to assume 8-byte alignment like the host kernel since commit d9b0cde91c60 ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if supported"), is because uml links against libc. There is no reason to assume libc is also compiled with that flag and assumes 8-byte alignment rather than 16-byte. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838 [2] https://c9x.me/x86/html/file_module_x86_id_180.html Signed-off-by: YiFei Zhu <zhuyifei1999@gmail.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Richard Weinberger <richard@nod.at>
* Revert "um: allocate a guard page to helper threads"Johannes Berg2021-01-261-1/+1
| | | | | | | | | | | | | | | | | | | | | This reverts commit ef4459a6da09 ("um: allocate a guard page to helper threads"), it's broken in multiple ways: 1) the free no longer matches the alloc; and 2) more importantly, the set_memory_ro() causes allocation of page tables for the normal memory that doesn't have any, and that later causes corruption and crashes (usually but not always in vfree()). We could fix the first bug and use vmalloc() to work around the second, but set_memory_ro() actually doesn't do anything either so I'll just revert that as well. Reported-by: Benjamin Berg <benjamin@sipsolutions.net> Fixes: ef4459a6da09 ("um: allocate a guard page to helper threads") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: ubd: fix command line handling of ubdHajime Tazaki2021-01-261-2/+2
| | | | | | | | | | | | | This commit fixes a regression to handle command line parameters of ubd. With a simple line "./linux ubd0="./disk-ext4.img", it fails at ubd_setup_common(). The commit adds additional checks to the variables in order to properly parse the paremeters which previously worked. Fixes: ef3ba87cb7c9 ("um: ubd: Set device serial attribute from cmdline") Cc: Christopher Obbard <chris.obbard@collabora.com> Signed-off-by: Hajime Tazaki <thehajime@gmail.com> Acked-by: Christopher Obbard <chris.obbard@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: allocate a guard page to helper threadsJohannes Berg2020-12-131-1/+1
| | | | | | | | | | | | | | | | We've been running into stack overflows in helper threads corrupting memory (e.g. because somebody put printf() or os_info() there), so to avoid those causing hard-to-debug issues later on, allocate a guard page for helper thread stacks and mark it read-only. Unfortunately, the crash dump at that point is useless as the stack tracer will try to backtrace the *kernel* thread, not the helper thread, but at least we don't survive to a random issue caused by corruption. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: Support dynamic IRQ allocationJohannes Berg2020-12-131-1/+1
| | | | | | | | | | | | | | | | | | | | | It's cumbersome and error-prone to keep adding fixed IRQ numbers, and for proper device wakeup support for the virtio/vhost-user support we need to have different IRQs for each device. Even if in theory two IRQs (with and without wake) might be sufficient, it's much easier to reason about it when we have dynamic number assignment. It also makes it easier to add new devices that may dynamically exist or depending on the configuration, etc. Add support for this, up to 64 IRQs (the same limit as epoll FDs we have right now). Since it's not easy to port all the existing places to dynamic allocation (some data is statically initialized) keep the low numbers are reserved for the existing hard-coded IRQ numbers. Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: ubd: Set device serial attribute from cmdlineChristopher Obbard2020-12-131-16/+62
| | | | | | | | | | | | | | Adds the ability to set the UBD device serial number from the commandline, disabling the serial number functionality by default. In some cases it may be useful to set a serial to the UBD device, such that downstream users (i.e. udev) can use this information to better describe the hardware to the user from the UML cmdline. In our case we use this parameter to create some entries under /dev/disk/by-ubd-id/ for each of the UBD devices passed through the UML cmdline. Signed-off-by: Christopher Obbard <chris.obbard@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: ubd: Submit all data segments atomicallyGabriel Krisman Bertazi2020-12-131-76/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Internally, UBD treats each physical IO segment as a separate command to be submitted in the execution pipe. If the pipe returns a transient error after a few segments have already been written, UBD will tell the block layer to requeue the request, but there is no way to reclaim the segments already submitted. When a new attempt to dispatch the request is done, those segments already submitted will get duplicated, causing the WARN_ON below in the best case, and potentially data corruption. In my system, running a UML instance with 2GB of RAM and a 50M UBD disk, I can reproduce the WARN_ON by simply running mkfs.fvat against the disk on a freshly booted system. There are a few ways to around this, like reducing the pressure on the pipe by reducing the queue depth, which almost eliminates the occurrence of the problem, increasing the pipe buffer size on the host system, or by limiting the request to one physical segment, which causes the block layer to submit way more requests to resolve a single operation. Instead, this patch modifies the format of a UBD command, such that all segments are sent through a single element in the communication pipe, turning the command submission atomic from the point of view of the block layer. The new format has a variable size, depending on the number of elements, and looks like this: +------------+-----------+-----------+------------ | cmd_header | segment 0 | segment 1 | segment ... +------------+-----------+-----------+------------ With this format, we push a pointer to cmd_header in the submission pipe. This has the advantage of reducing the memory footprint of executing a single request, since it allow us to merge some fields in the header. It is possible to reduce even further each segment memory footprint, by merging bitmap_words and cow_offset, for instance, but this is not the focus of this patch and is left as future work. One issue with the patch is that for a big number of segments, we now perform one big memory allocation instead of multiple small ones, but I wasn't able to trigger any real issues or -ENOMEM because of this change, that wouldn't be reproduced otherwise. This was tested using fio with the verify-crc32 option, and by running an ext4 filesystem over this UBD device. The original WARN_ON was: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141 refcount_t: underflow; use-after-free. Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346 Stack: 6084eed0 6063dc77 00000009 6084ef60 00000000 604b8d9f 6084eee0 6063dcbc 6084ef40 6006ab8d e013d780 1c00000000 Call Trace: [<600a0c1c>] ? printk+0x0/0x94 [<6004a888>] show_stack+0x13b/0x155 [<6063dc77>] ? dump_stack_print_info+0xdf/0xe8 [<604b8d9f>] ? refcount_warn_saturate+0x13f/0x141 [<6063dcbc>] dump_stack+0x2a/0x2c [<6006ab8d>] __warn+0x107/0x134 [<6008da6c>] ? wake_up_process+0x17/0x19 [<60487628>] ? blk_queue_max_discard_sectors+0x0/0xd [<6006b05f>] warn_slowpath_fmt+0xd1/0xdf [<6006af8e>] ? warn_slowpath_fmt+0x0/0xdf [<600acc14>] ? raw_read_seqcount_begin.constprop.0+0x0/0x15 [<600619ae>] ? os_nsecs+0x1d/0x2b [<604b8d9f>] refcount_warn_saturate+0x13f/0x141 [<6048bc8f>] refcount_sub_and_test.constprop.0+0x2f/0x37 [<6048c8de>] blk_mq_free_request+0xf1/0x10d [<6048ca06>] __blk_mq_end_request+0x10c/0x114 [<6005ac0f>] ubd_intr+0xb5/0x169 [<600a1a37>] __handle_irq_event_percpu+0x6b/0x17e [<600a1b70>] handle_irq_event_percpu+0x26/0x69 [<600a1bd9>] handle_irq_event+0x26/0x34 [<600a1bb3>] ? handle_irq_event+0x0/0x34 [<600a5186>] ? unmask_irq+0x0/0x37 [<600a57e6>] handle_edge_irq+0xbc/0xd6 [<600a131a>] generic_handle_irq+0x21/0x29 [<60048f6e>] do_IRQ+0x39/0x54 [...] ---[ end trace c6e7444e55386c0f ]--- Cc: Christopher Obbard <chris.obbard@collabora.com> Reported-by: Martyn Welch <martyn@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Tested-by: Christopher Obbard <chris.obbard@collabora.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: ubd: Retry buffer read on any kind of errorGabriel Krisman Bertazi2020-03-291-4/+4
| | | | | | | | | | Should bulk_req_safe_read return an error, we want to retry the read, otherwise, even though no IO will be done, os_write_file might still end up writing garbage to the pipe. Cc: Martyn Welch <martyn.welch@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: ubd: Prevent buffer overrun on command completionGabriel Krisman Bertazi2020-03-291-1/+3
| | | | | | | | | | | | On the hypervisor side, when completing commands and the pipe is full, we retry writing only the entries that failed, by offsetting io_req_buffer, but we don't reduce the number of bytes written, which can cause a buffer overrun of io_req_buffer, and write garbage to the pipe. Cc: Martyn Welch <martyn.welch@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2020-01-291-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This series is slightly unusual because it includes Arnd's compat ioctl tree here: 1c46a2cf2dbd Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue Excluding Arnd's changes, this is mostly an update of the usual drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas. There are a couple of core and base updates around error propagation and atomicity in the attribute container base we use for the SCSI transport classes. The rest is minor changes and updates" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits) scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity scsi: hisi_sas: Modify the file permissions of trigger_dump to write only scsi: hisi_sas: Replace magic number when handle channel interrupt scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock scsi: hisi_sas: use threaded irq to process CQ interrupts scsi: ufs: Use UFS device indicated maximum LU number scsi: ufs: Add max_lu_supported in struct ufs_dev_info scsi: ufs: Delete is_init_prefetch from struct ufs_hba scsi: ufs: Inline two functions into their callers scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init() scsi: ufs: Split ufshcd_probe_hba() based on its called flow scsi: ufs: Delete struct ufs_dev_desc scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails scsi: ufs-mediatek: enable low-power mode for hibern8 state scsi: ufs: export some functions for vendor usage scsi: ufs-mediatek: add dbg_register_dump implementation scsi: qla2xxx: Fix a NULL pointer dereference in an error path scsi: qla1280: Make checking for 64bit support consistent scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1 ...
| * compat_ioctl: ubd, aoe: use blkdev_compat_ptr_ioctlArnd Bergmann2020-01-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | These drivers implement the HDIO_GET_IDENTITY and CDROMVOLREAD ioctl commands, which are compatible between 32-bit and 64-bit user space and traditionally handled by compat_blkdev_driver_ioctl(). As a prerequisite to removing that function, make both drivers use blkdev_compat_ptr_ioctl() as their .compat_ioctl callback. Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* | um: ubd: use 64-bit time_t where possibleArnd Bergmann2019-12-181-5/+5
|/ | | | | | | | | | | | | | | | | | | | The ubd code suffers from a possible y2038 overflow on 32-bit architectures, both for the cow header and the os_file_modtime() function. Replace time_t with time64_t to extend the ubd_kern side as much as possible. Whether this makes a difference for the user side depends on the host libc implementation that may use either 32-bit or 64-bit time_t. For the cow file format, the header contains an unsigned 32-bit timestamp, which is good until y2106, passing this through a 'long long' gives us a consistent interpretation between 32-bit and 64-bit um kernels. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* um-ubd: Entrust re-queue to the upper layersAnton Ivanov2019-10-291-2/+6
| | | | | | | | | | Fixes crashes due to ubd requeue logic conflicting with the block-mq logic. Crash is reproducible in 5.0 - 5.3. Fixes: 53766defb8c8 ("um: Clean-up command processing in UML UBD driver") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* um: Add SPDX headers for files in arch/um/driversAlex Dewar2019-09-151-1/+1
| | | | | | | Convert files to use SPDX header. All files are licensed under the GPLv2. Signed-off-by: Alex Dewar <alex.dewar@gmx.co.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: Do not unlock mutex that is not hold.Daniel Walter2019-05-071-2/+2
| | | | | | | | | Return error instead of trying to unlock a mutex that is not hold. Signed-off-by: Daniel Walter <dwalter@google.com> Reviewed-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: Fix for a possible OOPS in ubd initializationAnton Ivanov2019-03-061-3/+3
| | | | | | | | | | | | | If the ubd device failed to allocate a queue during initialization it tried call blk_cleanup_queue resulting in an oops. This patch simplifies the cleanup logic and ensures that blk_queue_cleanup is called only if there is a valid queue. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: Remove obsolete reenable_XX callsAnton Ivanov2018-12-271-1/+0
| | | | | | | | | | reenable_fd has been a NOP since the introduction of the EPOLL based interrupt controller. reenable_channel() is no longer needed as the flow control is now handled via the write IRQs on the channel. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: Add support for DISCARD in the UBD DriverAnton Ivanov2018-12-271-11/+54
| | | | | | | | | | | | | | | Support for DISCARD and WRITE_ZEROES in the ubd driver using fallocate. DISCARD is enabled by default and can be disabled using a new UBD command line flag. If the underlying fs on which the UBD image is stored does not support DISCARD the support for both DISCARD and WRITE_ZEROES is turned off. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: Remove unsafe printks from the io threadAnton Ivanov2018-12-271-25/+17
| | | | | | | | | | | Printk out of the io thread has been proven to be unsafe. This is not surprising as the thread is part of the UML hypervisor code. It is not supposed to invoke any kernel code/resources. It is necesssary to pass the error to the block IO layer and let it Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: Clean-up command processing in UML UBD driverAnton Ivanov2018-12-271-30/+47
| | | | | | | | Clean-up command processing and return BLK_STS_NOTSUP for uknown commands. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: Switch to block-mq constants in the UML UBD driverAnton Ivanov2018-12-271-28/+38
| | | | | | | | Switch to block mq-constants for both commands, error codes and various computations. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* ubd: fix missing initialization of io_reqAnton Ivanov2018-11-081-2/+1
| | | | | | | | | | | | | The SYNC path doesn't initialize io_req->error, which can cause random errors. Before the conversion to blk-mq, we always completed requests with BLK_STS_OK status, but now we actually look at the error field and this issue becomes apparent. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> [axboe: fixed up commit message to explain what is actually going on] Signed-off-by: Jens Axboe <axboe@kernel.dk>
* ubd: fix missing lock around request issueJens Axboe2018-11-071-2/+7
| | | | | | | | | | We need to hold the device lock (and disable interrupts) while writing new commands, or we could be interrupted while that is happening and read invalid requests in the completion path. Fixes: 4e6da0fe8058 ("um: Convert ubd driver to blk-mq") Tested-by: Richard Weinberger <richard@nod.at> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* ubd: remove use of blk_rq_map_sgChristoph Hellwig2018-10-181-104/+54
| | | | | | | | | | There is no good reason to create a scatterlist in the ubd driver, it can just iterate the request directly. Signed-off-by: Christoph Hellwig <hch@lst.de> [rw: Folded in improvements as discussed with hch and jens] Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* um: Convert ubd driver to blk-mqRichard Weinberger2018-10-141-85/+93
| | | | | | | | | Convert the driver to the modern blk-mq framework. As byproduct we get rid of our open coded restart logic and let blk-mq handle it. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: genhd: add 'groups' argument to device_add_diskHannes Reinecke2018-09-281-1/+1
| | | | | | | | | | | | | | Update device_add_disk() to take an 'groups' argument so that individual drivers can register a device with additional sysfs attributes. This avoids race condition the driver would otherwise have if these groups were to be created with sysfs_add_groups(). Signed-off-by: Martin Wilck <martin.wilck@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* treewide: kmalloc() -> kmalloc_array()Kees Cook2018-06-121-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
* proc: introduce proc_create_single{,_data}Christoph Hellwig2018-05-161-14/+2
| | | | | | | | | Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
* Epoll based IRQ controllerAnton Ivanov2018-02-191-2/+2
| | | | | | | | | | | | | | | | 1. Removes the need to walk the IRQ/Device list to determine who triggered the IRQ. 2. Improves scalability (up to several times performance improvement for cases with 10s of devices). 3. Improves UML baseline IO performance for one disk + one NIC use case by up to 10%. 4. Introduces write poll triggered IRQs. 5. Prerequisite for introducing high performance mmesg family of functions in network IO. 6. Fixes RNG shutdown which was leaking a file descriptor Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* block: introduce new block status code typeChristoph Hellwig2017-06-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead introduces a new blk_status_t value that holds block layer specific status codes and explicitly explains their meaning. Helpers to convert from and to the previous special meanings are provided for now, but I suspect we want to get rid of them in the long run - those drivers that have a errno input (e.g. networking) usually get errnos that don't know about the special block layer overloads, and similarly returning them to userspace will usually return somethings that strictly speaking isn't correct for file system operations, but that's left as an exercise for later. For now the set of errors is a very limited set that closely corresponds to the previous overloaded errno values, but there is some low hanging fruite to improve it. blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse typechecking, so that we can easily catch places passing the wrong values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* um: UBD ImprovementsAnton Ivanov2016-12-141-26/+142
| | | | | | | | | | | | | UBD at present is extremely slow because it handles only one request at a time in the IO thread and IRQ handler. The single request at a time is replaced by handling multiple requests as well as necessary workarounds for short reads/writes. Resulting performance improvement in disk IO - 30% Signed-off-by: Anton Ivanov <aivanov@kot-begemot.co.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
* um: track 'parent' device in a local variableDan Williams2016-06-221-2/+3
| | | | | | | | | | | In preparation for the removal of 'driverfs_dev' from 'struct gendisk' use a local variable to track the parented vs un-parented case in ubd_disk_register(). Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* block, drivers: add REQ_OP_FLUSH operationMike Christie2016-06-071-1/+1
| | | | | | | | | | | This adds a REQ_OP_FLUSH operation that is sent to request_fn based drivers by the block layer's flush code, instead of sending requests with the request->cmd_flags REQ_FLUSH bit set. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>