summaryrefslogtreecommitdiffstats
path: root/block/ioprio.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-blockLinus Torvalds2022-01-121-32/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: - Unify where the struct request handling code is located in the blk-mq code (Christoph) - Header cleanups (Christoph) - Clean up the io_context handling code (Christoph, me) - Get rid of ->rq_disk in struct request (Christoph) - Error handling fix for add_disk() (Christoph) - request allocation cleanusp (Christoph) - Documentation updates (Eric, Matthew) - Remove trivial crypto unregister helper (Eric) - Reduce shared tag overhead (John) - Reduce poll_stats memory overhead (me) - Known indirect function call for dio (me) - Use atomic references for struct request (me) - Support request list issue for block and NVMe (me) - Improve queue dispatch pinning (Ming) - Improve the direct list issue code (Keith) - BFQ improvements (Jan) - Direct completion helper and use it in mmc block (Sebastian) - Use raw spinlock for the blktrace code (Wander) - fsync error handling fix (Ye) - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me) * tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits) MAINTAINERS: add entries for block layer documentation docs: block: remove queue-sysfs.rst docs: sysfs-block: document virt_boundary_mask docs: sysfs-block: document stable_writes docs: sysfs-block: fill in missing documentation from queue-sysfs.rst docs: sysfs-block: add contact for nomerges docs: sysfs-block: sort alphabetically docs: sysfs-block: move to stable directory block: don't protect submit_bio_checks by q_usage_counter block: fix old-style declaration nvme-pci: fix queue_rqs list splitting block: introduce rq_list_move block: introduce rq_list_for_each_safe macro block: move rq_list macros to blk-mq.h block: drop needless assignment in set_task_ioprio() block: remove unnecessary trailing '\' bio.h: fix kernel-doc warnings block: check minor range in device_add_disk() block: use "unsigned long" for blk_validate_block_size(). block: fix error unwinding in device_add_disk ...
| * block: move set_task_ioprio to blk-ioc.cChristoph Hellwig2021-12-161-32/+0
| | | | | | | | | | | | | | | | | | | | Keep set_task_ioprio with the other low-level code that accesses the io_context structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211209063131.18537-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2)Davidlohr Bueso2021-12-101-0/+3
|/ | | | | | | | | | | | | do_each_pid_thread(PIDTYPE_PGID) can race with a concurrent change_pid(PIDTYPE_PGID) that can move the task from one hlist to another while iterating. Serialize ioprio_get to take the tasklist_lock in this case, just like it's set counterpart. Fixes: d69b78ba1de (ioprio: grab rcu_read_lock in sys_ioprio_{set,get}()) Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Link: https://lore.kernel.org/r/20211210182058.43417-1-dave@stgolabs.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Check ADMIN before NICE for IOPRIO_CLASS_RTAlistair Delva2021-11-151-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Booting to Android userspace on 5.14 or newer triggers the following SELinux denial: avc: denied { sys_nice } for comm="init" capability=23 scontext=u:r:init:s0 tcontext=u:r:init:s0 tclass=capability permissive=0 Init is PID 0 running as root, so it already has CAP_SYS_ADMIN. For better compatibility with older SEPolicy, check ADMIN before NICE. Fixes: 9d3a39a5f1e4 ("block: grant IOPRIO_CLASS_RT to CAP_SYS_NICE") Signed-off-by: Alistair Delva <adelva@google.com> Cc: Khazhismel Kumykov <khazhy@google.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Paul Moore <paul@paul-moore.com> Cc: selinux@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: kernel-team@android.com Cc: stable@vger.kernel.org # v5.14+ Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Serge Hallyn <serge@hallyn.com> Link: https://lore.kernel.org/r/20211115181655.3608659-1-adelva@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: fix default IO priority handlingDamien Le Moal2021-08-181-3/+3
| | | | | | | | | | | | | | | | | | | | | The default IO priority is the best effort (BE) class with the normal priority level IOPRIO_NORM (4). However, get_task_ioprio() returns IOPRIO_CLASS_NONE/IOPRIO_NORM as the default priority and get_current_ioprio() returns IOPRIO_CLASS_NONE/0. Let's be consistent with the defined default and have both of these functions return the default priority IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM) when the user did not define another default IO priority for the task. In include/uapi/linux/ioprio.h, introduce the IOPRIO_BE_NORM macro as an alias to IOPRIO_NORM to clarify that this default level applies to the BE priotity class. In include/linux/ioprio.h, define the macro IOPRIO_DEFAULT as IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM) and use this new macro when setting a priority to the default. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210811033702.368488-7-damien.lemoal@wdc.com [axboe: drop unnecessary lightnvm change] Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Introduce IOPRIO_NR_LEVELSDamien Le Moal2021-08-181-2/+1
| | | | | | | | | | | | | | | | The BFQ scheduler and ioprio_check_cap() both assume that the RT priority class (IOPRIO_CLASS_RT) can have up to 8 different priority levels, similarly to the BE class (IOPRIO_CLASS_iBE). This is controlled using the IOPRIO_BE_NR macro , which is badly named as the number of levels also applies to the RT class. Introduce the class independent IOPRIO_NR_LEVELS macro, defined to 8, to make things clear. Keep the old IOPRIO_BE_NR macro definition as an alias for IOPRIO_NR_LEVELS. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210811033702.368488-6-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Fix sys_ioprio_set(.which=IOPRIO_WHO_PGRP) task iterationPeter Zijlstra2021-04-081-2/+9
| | | | | | | | | | | do_each_pid_thread() { } while_each_pid_thread() is a double loop and thus break doesn't work as expected. Also, it should be used under tasklist_lock because otherwise we can race against change_pid() for PGID/SID. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/YG7Q5C4Rb5dx5GFx@hirez.programming.kicks-ass.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: grant IOPRIO_CLASS_RT to CAP_SYS_NICEKhazhismel Kumykov2020-09-011-1/+1
| | | | | | | | | | | CAP_SYS_ADMIN is too broad, and ionice fits into CAP_SYS_NICE's grouping. Retain CAP_SYS_ADMIN permission for backwards compatibility. Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva2020-08-231-1/+1
| | | | | | | | | | Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
* docs: block: convert to ReSTMauro Carvalho Chehab2019-07-151-1/+1
| | | | | | | | | | | Rename the block documentation files to ReST, add an index for them and adjust in order to produce a nice html output via the Sphinx build system. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
* block: add SPDX tags to block layer files missing licensing informationChristoph Hellwig2019-04-301-0/+1
| | | | | | | | | Various block layer files do not have any licensing information at all. Add SPDX tags for the default kernel GPLv2 license to those. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: add ioprio_check_cap functionAdam Manzanares2018-05-311-6/+16
| | | | | | | | | | | | | | Aio per command iopriority support introduces a second interface between userland and the kernel capable of passing iopriority. The aio interface also needs the ability to verify that the submitting context has sufficient privileges to submit IOPRIO_RT commands. This patch creates the ioprio_check_cap function to be used by the ioprio_set system call and also by the aio interface. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* block: Add fallthrough markers to switch statementsBart Van Assche2017-06-211-1/+2
| | | | | | | | | | This patch suppresses gcc 7 warnings about falling through in switch statements when building with W=1. From the gcc documentation: The -Wimplicit-fallthrough=3 warning is enabled by -Wextra. See also https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Warning-Options.html. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Optimize ioprio_best()Bart Van Assche2017-04-191-11/+1
| | | | | | | | | | | | | Since ioprio_best() translates IOPRIO_CLASS_NONE into IOPRIO_CLASS_BE and since lower numerical priority values represent a higher priority a simple numerical comparison is sufficient. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Adam Manzanares <adam.manzanares@wdc.com> Tested-by: Adam Manzanares <adam.manzanares@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
* sched/headers: Prepare to move the task_lock()/unlock() APIs to ↵Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | | <linux/sched/task.h> But first update the code that uses these facilities with the new header. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | | | | Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | | | | | | | | <linux/sched/user.h> We are going to split <linux/sched/user.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/user.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* block: use for_each_thread() in sys_ioprio_set()/sys_ioprio_get()Tetsuo Handa2017-02-221-4/+4
| | | | | | | | | | | | | | | | IOPRIO_WHO_USER case in sys_ioprio_set()/sys_ioprio_get() are using while_each_thread(), which is unsafe under RCU lock according to commit 0c740d0afc3bff0a ("introduce for_each_thread() to replace the buggy while_each_thread()"). Use for_each_thread() (via for_each_process_thread()) which is safe under RCU lock. Link: http://lkml.kernel.org/r/201702011947.DBD56740.OMVHOLOtSJFFFQ@I-love.SAKURA.ne.jp Link: http://lkml.kernel.org/r/1486041779-4401-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* block: fix use-after-free in sys_ioprio_get()Omar Sandoval2016-07-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | get_task_ioprio() accesses the task->io_context without holding the task lock and thus can race with exit_io_context(), leading to a use-after-free. The reproducer below hits this within a few seconds on my 4-core QEMU VM: #define _GNU_SOURCE #include <assert.h> #include <unistd.h> #include <sys/syscall.h> #include <sys/wait.h> int main(int argc, char **argv) { pid_t pid, child; long nproc, i; /* ioprio_set(IOPRIO_WHO_PROCESS, 0, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)); */ syscall(SYS_ioprio_set, 1, 0, 0x6000); nproc = sysconf(_SC_NPROCESSORS_ONLN); for (i = 0; i < nproc; i++) { pid = fork(); assert(pid != -1); if (pid == 0) { for (;;) { pid = fork(); assert(pid != -1); if (pid == 0) { _exit(0); } else { child = wait(NULL); assert(child == pid); } } } pid = fork(); assert(pid != -1); if (pid == 0) { for (;;) { /* ioprio_get(IOPRIO_WHO_PGRP, 0); */ syscall(SYS_ioprio_get, 2, 0); } } } for (;;) { /* ioprio_get(IOPRIO_WHO_PGRP, 0); */ syscall(SYS_ioprio_get, 2, 0); } return 0; } This gets us KASAN dumps like this: [ 35.526914] ================================================================== [ 35.530009] BUG: KASAN: out-of-bounds in get_task_ioprio+0x7b/0x90 at addr ffff880066f34e6c [ 35.530009] Read of size 2 by task ioprio-gpf/363 [ 35.530009] ============================================================================= [ 35.530009] BUG blkdev_ioc (Not tainted): kasan: bad access detected [ 35.530009] ----------------------------------------------------------------------------- [ 35.530009] Disabling lock debugging due to kernel taint [ 35.530009] INFO: Allocated in create_task_io_context+0x2b/0x370 age=0 cpu=0 pid=360 [ 35.530009] ___slab_alloc+0x55d/0x5a0 [ 35.530009] __slab_alloc.isra.20+0x2b/0x40 [ 35.530009] kmem_cache_alloc_node+0x84/0x200 [ 35.530009] create_task_io_context+0x2b/0x370 [ 35.530009] get_task_io_context+0x92/0xb0 [ 35.530009] copy_process.part.8+0x5029/0x5660 [ 35.530009] _do_fork+0x155/0x7e0 [ 35.530009] SyS_clone+0x19/0x20 [ 35.530009] do_syscall_64+0x195/0x3a0 [ 35.530009] return_from_SYSCALL_64+0x0/0x6a [ 35.530009] INFO: Freed in put_io_context+0xe7/0x120 age=0 cpu=0 pid=1060 [ 35.530009] __slab_free+0x27b/0x3d0 [ 35.530009] kmem_cache_free+0x1fb/0x220 [ 35.530009] put_io_context+0xe7/0x120 [ 35.530009] put_io_context_active+0x238/0x380 [ 35.530009] exit_io_context+0x66/0x80 [ 35.530009] do_exit+0x158e/0x2b90 [ 35.530009] do_group_exit+0xe5/0x2b0 [ 35.530009] SyS_exit_group+0x1d/0x20 [ 35.530009] entry_SYSCALL_64_fastpath+0x1a/0xa4 [ 35.530009] INFO: Slab 0xffffea00019bcd00 objects=20 used=4 fp=0xffff880066f34ff0 flags=0x1fffe0000004080 [ 35.530009] INFO: Object 0xffff880066f34e58 @offset=3672 fp=0x0000000000000001 [ 35.530009] ================================================================== Fix it by grabbing the task lock while we poke at the io_context. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* pidns: fix set/getpriority and ioprio_set/get in PRIO_USER modeBen Segall2015-11-061-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | setpriority(PRIO_USER, 0, x) will change the priority of tasks outside of the current pid namespace. This is in contrast to both the other modes of setpriority and the example of kill(-1). Fix this. getpriority and ioprio have the same failure mode, fix them too. Eric said: : After some more thinking about it this patch sounds justifiable. : : My goal with namespaces is not to build perfect isolation mechanisms : as that can get into ill defined territory, but to build well defined : mechanisms. And to handle the corner cases so you can use only : a single namespace with well defined results. : : In this case you have found the two interfaces I am aware of that : identify processes by uid instead of by pid. Which quite frankly is : weird. Unfortunately the weird unexpected cases are hard to handle : in the usual way. : : I was hoping for a little more information. Changes like this one we : have to be careful of because someone might be depending on the current : behavior. I don't think they are and I do think this make sense as part : of the pid namespace. Signed-off-by: Ben Segall <bsegall@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ambrose Feinstein <ambrose@google.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* block: Fix computation of merged request priorityJan Kara2014-10-311-6/+8
| | | | | | | | | | | | | | Priority of a merged request is computed by ioprio_best(). If one of the requests has undefined priority (IOPRIO_CLASS_NONE) and another request has priority from IOPRIO_CLASS_BE, the function will return the undefined priority which is wrong. Fix the function to properly return priority of a request with the defined priority. Fixes: d58cdfb89ce0c6bd5f81ae931a984ef298dbda20 CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* block: move ioprio.c from fs/ to block/Jens Axboe2014-05-191-0/+241
Like commit f9c78b2b, move this block related file outside of fs/ and into the core block directory, block/. Signed-off-by: Jens Axboe <axboe@fb.com>