| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now block layer knows everything necessary to create and associate
icq's with requests. Move ioc_create_icq() to blk-ioc.c and update
get_request() such that, if elevator_type->icq_size is set, requests
are automatically associated with their matching icq's before
elv_set_request(). io_context reference is also managed by block core
on request alloc/free.
* Only ioprio/cgroup changed handling remains from cfq_get_cic().
Collapsed into cfq_set_request().
* This removes queue kicking on icq allocation failure (for now). As
icq allocation failure is rare and the only effect of queue kicking
achieved was possibily accelerating queue processing, this change
shouldn't be noticeable.
There is a larger underlying problem. Unlike request allocation,
icq allocation is not guaranteed to succeed eventually after
retries. The number of icq is unbound and thus mempool can't be the
solution either. This effectively adds allocation dependency on
memory free path and thus possibility of deadlock.
This usually wouldn't happen because icq allocation is not a hot
path and, even when the condition triggers, it's highly unlikely
that none of the writeback workers already has icq.
However, this is still possible especially if elevator is being
switched under high memory pressure, so we better get it fixed.
Probably the only solution is just bypassing elevator and appending
to dispatch queue on any elevator allocation failure.
* Comment added to explain how icq's are managed and synchronized.
This completes cleanup of io_context interface.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add elevator_ops->elevator_init_icq_fn() and restructure
cfq_create_cic() and rename it to ioc_create_icq().
The new function expects its caller to pass in io_context, uses
elevator_type->icq_cache, handles generic init, calls the new elevator
operation for elevator specific initialization, and returns pointer to
created or looked up icq. This leaves cfq_icq_pool variable without
any user. Removed.
This prepares for io_context interface cleanup and doesn't introduce
any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With kmem_cache managed by blk-ioc, io_cq exit/release can be moved to
blk-ioc too. The odd ->io_cq->exit/release() callbacks are replaced
with elevator_ops->elevator_exit_icq_fn() with unlinking from both ioc
and q, and freeing automatically handled by blk-ioc. The elevator
operation only need to perform exit operation specific to the elevator
- in cfq's case, exiting the cfqq's.
Also, clearing of io_cq's on q detach is moved to block core and
automatically performed on elevator switch and q release.
Because the q io_cq points to might be freed before RCU callback for
the io_cq runs, blk-ioc code should remember to which cache the io_cq
needs to be freed when the io_cq is released. New field
io_cq->__rcu_icq_cache is added for this purpose. As both the new
field and rcu_head are used only after io_cq is released and the
q/ioc_node fields aren't, they are put into unions.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let elevators set ->icq_size and ->icq_align in elevator_type and
elv_register() and elv_unregister() respectively create and destroy
kmem_cache for icq.
* elv_register() now can return failure. All callers updated.
* icq caches are automatically named "ELVNAME_io_cq".
* cfq_slab_setup/kill() are collapsed into cfq_init/exit().
* While at it, minor indentation change for iosched_cfq.elevator_name
for consistency.
This will help moving icq management to block core. This doesn't
introduce any functional change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of icq management is about to be moved out of cfq into blk-ioc.
This patch prepares for it.
* Move cfqd->icq_list to request_queue->icq_list
* Make request explicitly point to icq instead of through elevator
private data. ->elevator_private[3] is replaced with sub struct elv
which contains icq pointer and priv[2]. cfq is updated accordingly.
* Meaningless clearing of ->elevator_private[0] removed from
elv_set_request(). At that point in code, the field was guaranteed
to be %NULL anyway.
This patch doesn't introduce any functional change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently io_context and cfq logics are mixed without clear boundary.
Most of io_context is independent from cfq but cfq_io_context handling
logic is dispersed between generic ioc code and cfq.
cfq_io_context represents association between an io_context and a
request_queue, which is a concept useful outside of cfq, but it also
contains fields which are useful only to cfq.
This patch takes out generic part and put it into io_cq (io
context-queue) and the rest into cfq_io_cq (cic moniker remains the
same) which contains io_cq. The following changes are made together.
* cfq_ttime and cfq_io_cq now live in cfq-iosched.c.
* All related fields, functions and constants are renamed accordingly.
* ioc->ioc_data is now "struct io_cq *" instead of "void *" and
renamed to icq_hint.
This prepares for io_context API cleanup. Documentation is currently
sparse. It will be added later.
Changes in this patch are mechanical and don't cause functional
change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
elevator_queue->ops points to the same ops struct ->elevator_type.ops
is pointing to. The only effect of caching it in elevator_queue is
shorter notation - it doesn't save any indirect derefence.
Relocate elevator_type->list which used only during module init/exit
to the end of the structure, rename elevator_queue->elevator_type to
->type, and replace elevator_queue->ops with elevator_queue->type.ops.
This doesn't introduce any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
| |
Now that lazy paths are removed, cfqd_dead_key() is meaningless and
cic->q can be used whereever cic->key is used. Kill cic->key.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
| |
Now that cic's are immediately unlinked under both locks, there's no
need to count and drain cic's before module unload. RCU callback
completion is waited with rcu_barrier().
While at it, remove residual RCU operations on cic_list.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
| |
Now that all cic's are immediately unlinked from both ioc and queue,
lazy dropping from lookup path and trimming on elevator unregister are
unnecessary. Kill them and remove now unused elevator_ops->trim().
This also leaves call_for_each_cic() without any user. Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cic is association between io_context and request_queue. A cic is
linked from both ioc and q and should be destroyed when either one
goes away. As ioc and q both have their own locks, locking becomes a
bit complex - both orders work for removal from one but not from the
other.
Currently, cfq tries to circumvent this locking order issue with RCU.
ioc->lock nests inside queue_lock but the radix tree and cic's are
also protected by RCU allowing either side to walk their lists without
grabbing lock.
This rather unconventional use of RCU quickly devolves into extremely
fragile convolution. e.g. The following is from cfqd going away too
soon after ioc and q exits raced.
general protection fault: 0000 [#1] PREEMPT SMP
CPU 2
Modules linked in:
[ 88.503444]
Pid: 599, comm: hexdump Not tainted 3.1.0-rc10-work+ #158 Bochs Bochs
RIP: 0010:[<ffffffff81397628>] [<ffffffff81397628>] cfq_exit_single_io_context+0x58/0xf0
...
Call Trace:
[<ffffffff81395a4a>] call_for_each_cic+0x5a/0x90
[<ffffffff81395ab5>] cfq_exit_io_context+0x15/0x20
[<ffffffff81389130>] exit_io_context+0x100/0x140
[<ffffffff81098a29>] do_exit+0x579/0x850
[<ffffffff81098d5b>] do_group_exit+0x5b/0xd0
[<ffffffff81098de7>] sys_exit_group+0x17/0x20
[<ffffffff81b02f2b>] system_call_fastpath+0x16/0x1b
The only real hot path here is cic lookup during request
initialization and avoiding extra locking requires very confined use
of RCU. This patch makes cic removal from both ioc and request_queue
perform double-locking and unlink immediately.
* From q side, the change is almost trivial as ioc->lock nests inside
queue_lock. It just needs to grab each ioc->lock as it walks
cic_list and unlink it.
* From ioc side, it's a bit more difficult because of inversed lock
order. ioc needs its lock to walk its cic_list but can't grab the
matching queue_lock and needs to perform unlock-relock dancing.
Unlinking is now wholly done from put_io_context() and fast path is
optimized by using the queue_lock the caller already holds, which is
by far the most common case. If the ioc accessed multiple devices,
it tries with trylock. In unlikely cases of fast path failure, it
falls back to full double-locking dance from workqueue.
Double-locking isn't the prettiest thing in the world but it's *far*
simpler and more understandable than RCU trick without adding any
meaningful overhead.
This still leaves a lot of now unnecessary RCU logics. Future patches
will trim them.
-v2: Vivek pointed out that cic->q was being dereferenced after
cic->release() was called. Updated to use local variable @this_q
instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ioprio/cgroup change was handled by marking the changed state in ioc
and, on the following access to the ioc, performing RCU-protected
iteration through all cic's grabbing the matching queue_lock.
This patch moves the changed state to each cic. When ioprio or cgroup
changes, the respective bit is set on all cic's of the ioc and when
each of those cic (not ioc) is accessed, change is applied for that
specific ioc-queue pair.
This also fixes the following two race conditions between setting and
clearing of changed states.
* Missing barrier between assign/load of ioprio and ioprio_changed
allowed applying old ioprio.
* Change requests could happen between application of change and
clearing of changed variables.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the following changes to prepare for ioc/cic management cleanup.
* Add cic->q so that ioc can determine the associated queue without
querying cfq. This will eventually replace ->key.
* Factor out cfq_release_cic() from cic_free_func(). This function
assumes that the caller handled locking.
* Rename __cfq_exit_single_io_context() to cfq_exit_cic() and make it
take only @cic.
* Restructure cfq_cic_link() for future updates.
This patch doesn't introduce any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* blk_get_queue() is peculiar in that it returns 0 on success and 1 on
failure instead of 0 / -errno or boolean. Update it such that it
returns %true on success and %false on failure.
* Make sure the caller checks for the return value.
* Separate out __blk_get_queue() which doesn't check whether @q is
dead and put it in blk.h. This will be used later.
This patch doesn't introduce any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ignoring copy_io() during fork, io_context can be allocated from two
places - current_io_context() and set_task_ioprio(). The former is
always called from local task while the latter can be called from
different task. The synchornization between them are peculiar and
dubious.
* current_io_context() doesn't grab task_lock() and assumes that if it
saw %NULL ->io_context, it would stay that way until allocation and
assignment is complete. It has smp_wmb() between alloc/init and
assignment.
* set_task_ioprio() grabs task_lock() for assignment and does
smp_read_barrier_depends() between "ioc = task->io_context" and "if
(ioc)". Unfortunately, this doesn't achieve anything - the latter
is not a dependent load of the former. ie, if ioc itself were being
dereferenced "ioc->xxx", it would mean something (not sure what tho)
but as the code currently stands, the dependent read barrier is
noop.
As only one of the the two test-assignment sequences is task_lock()
protected, the task_lock() can't do much about race between the two.
Nothing prevents current_io_context() and set_task_ioprio() allocating
its own ioc for the same task and overwriting the other's.
Also, set_task_ioprio() can race with exiting task and create a new
ioc after exit_io_context() is finished.
ioc get/put doesn't have any reason to be complex. The only hot path
is accessing the existing ioc of %current, which is simple to achieve
given that ->io_context is never destroyed as long as the task is
alive. All other paths can happily go through task_lock() like all
other task sub structures without impacting anything.
This patch updates ioc get/put so that it becomes more conventional.
* alloc_io_context() is replaced with get_task_io_context(). This is
the only interface which can acquire access to ioc of another task.
On return, the caller has an explicit reference to the object which
should be put using put_io_context() afterwards.
* The functionality of current_io_context() remains the same but when
creating a new ioc, it shares the code path with
get_task_io_context() and always goes through task_lock().
* get_io_context() now means incrementing ref on an ioc which the
caller already has access to (be that an explicit refcnt or implicit
%current one).
* PF_EXITING inhibits creation of new io_context and once
exit_io_context() is finished, it's guaranteed that both ioc
acquisition functions return %NULL.
* All users are updated. Most are trivial but
smp_read_barrier_depends() removal from cfq_get_io_context() needs a
bit of explanation. I suppose the original intention was to ensure
ioc->ioprio is visible when set_task_ioprio() allocates new
io_context and installs it; however, this wouldn't have worked
because set_task_ioprio() doesn't have wmb between init and install.
There are other problems with this which will be fixed in another
patch.
* While at it, use NUMA_NO_NODE instead of -1 for wildcard node
specification.
-v2: Vivek spotted contamination from debug patch. Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* int return from put_io_context() wasn't used by anybody. Make it
return void like other put functions and docbook-fy the function
comment.
* Reorder dummy declarations for !CONFIG_BLOCK case a bit.
* Make alloc_ioc_context() use __GFP_ZERO allocation, take init out of
if block and drop 0'ing.
* Docbook-fy current_io_context() comment.
This patch doesn't introduce any functional change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
| |
cfq allocates per-queue id using ida and uses it to index cic radix
tree from io_context. Move it to q->id and allocate on queue init and
free on queue release. This simplifies cfq a bit and will allow for
further improvements of io context life-cycle management.
This patch doesn't introduce any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
| |
There are a number of QUEUE_FLAG_DEAD tests. Add blk_queue_dead()
macro and use it.
This patch doesn't introduce any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only user left for blk_insert_request() is sx8 and it can be
trivially switched to use blk_execute_rq_nowait() - special requests
aren't included in io stat and sx8 doesn't use block layer tagging.
Switch sx8 and kill blk_insert_requeset().
This patch doesn't introduce any functional difference.
Only compile tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|\
| |
| |
| |
| |
| |
| |
| |
| | |
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: use new generic {enable,disable}_percpu_irq() routines
drivers/net/ethernet/tile: use skb_frag_page() API
asm-generic/unistd.h: support new process_vm_{readv,write} syscalls
arch/tile: fix double-free bug in homecache_free_pages()
arch/tile: add a few #includes and an EXPORT to catch up with kernel changes.
|
| |
| |
| |
| |
| |
| |
| |
| | |
Also prototype the "compat" functions so they can be referenced
from C code.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use atomic-long operations instead of looping around cmpxchg().
[akpm@linux-foundation.org: massage atomic.h inclusions]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
* '3.2-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (25 commits)
iscsi-target: Fix hex2bin warn_unused compile message
target: Don't return an error if disabling unsupported features
target/rd: fix or rewrite the copy routine
target/rd: simplify the page/offset computation
target: remove the unused se_dev_list
target/file: walk properly over sg list
target: remove unused struct fields
target: Fix page length in emulated INQUIRY VPD page 86h
target: Handle 0 correctly in transport_get_sectors_6()
target: Don't return an error status for 0-length READ and WRITE
iscsi-target: Use kmemdup rather than duplicating its implementation
iscsi-target: Add missing F_BIT for iscsi_tm_rsp
iscsi-target: Fix residual count hanlding + remove iscsi_cmd->residual_count
target: Reject SCSI data overflow for fabrics using transport_generic_map_mem_to_cmd
target: remove the unused t_task_pt_sgl and t_task_pt_sgl_num se_cmd fields
target: remove the t_tasks_bidi se_cmd field
target: remove the t_tasks_fua se_cmd field
target: remove the se_ordered_node se_cmd field
target: remove the se_obj_ptr and se_orig_obj_ptr se_cmd fields
target: Drop config_item_name usage in fabric TFO->free_wwn()
...
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Some are never used, some are set but never read, dev_hoq_count is
incremented and decremented, but never read.
Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
And use a SCF_BIDI flag instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
And use a SCF_FUA flag instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We never walk ordered_cmd_list in the se_device, so remove all code related
to supporting it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We already have a perfectly valid se_device pointer in the command, so
remove the mostly useless duplicates.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Convert to unsigned bit fields for active I/O shutdown fields.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch removes legacy usage of PYX_TRANSPORT_* return codes in a number
of locations and addresses cases where transport_generic_request_failure()
was returning the incorrect sense upon CHECK_CONDITION status after the
v3.1 converson to use errno return codes.
This includes the conversion of transport_generic_request_failure() to
process cmd->scsi_sense_reason and handle extra TCM_RESERVATION_CONFLICT
before calling transport_send_check_condition_and_sense() to queue up
response status. It also drops PYX_TRANSPORT_OUT_OF_MEMORY_RESOURCES legacy
usgae, and returns TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE w/ a response
for these cases.
transport_generic_allocate_tasks(), transport_generic_new_cmd(), backend
SCF_SCSI_DATA_SG_IO_CDB ->do_task(), and emulated ->execute_task() have
all been updated to set se_cmd->scsi_sense_reason and return errno codes
universally upon failure. This includes cmd->scsi_sense_reason assignment
in target_core_alua.c, target_core_pr.c and target_core_cdb.c emulation code.
Finally it updates fabric modules to remove the legacy usage, and for
TFO->new_cmd_map() callers forwards return values outside of fabric code.
iscsi-target has also been updated to remove a handful of special cases
related to the cleanup and signaling QUEUE_FULL handling w/ ft_write_pending()
(v2: Drop extra SCF_SCSI_CDB_EXCEPTION check during failure from
transport_generic_new_cmd, and re-add missing task->task_error_status
assignment in transport_complete_task)
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
__d_path() API is asking for trouble and in case of apparmor d_namespace_path()
getting just that. The root cause is that when __d_path() misses the root
it had been told to look for, it stores the location of the most remote ancestor
in *root. Without grabbing references. Sure, at the moment of call it had
been pinned down by what we have in *path. And if we raced with umount -l, we
could have very well stopped at vfsmount/dentry that got freed as soon as
prepend_path() dropped vfsmount_lock.
It is safe to compare these pointers with pre-existing (and known to be still
alive) vfsmount and dentry, as long as all we are asking is "is it the same
address?". Dereferencing is not safe and apparmor ended up stepping into
that. d_namespace_path() really wants to examine the place where we stopped,
even if it's not connected to our namespace. As the result, it looked
at ->d_sb->s_magic of a dentry that might've been already freed by that point.
All other callers had been careful enough to avoid that, but it's really
a bad interface - it invites that kind of trouble.
The fix is fairly straightforward, even though it's bigger than I'd like:
* prepend_path() root argument becomes const.
* __d_path() is never called with NULL/NULL root. It was a kludge
to start with. Instead, we have an explicit function - d_absolute_root().
Same as __d_path(), except that it doesn't get root passed and stops where
it stops. apparmor and tomoyo are using it.
* __d_path() returns NULL on path outside of root. The main
caller is show_mountinfo() and that's precisely what we pass root for - to
skip those outside chroot jail. Those who don't want that can (and do)
use d_path().
* __d_path() root argument becomes const. Everyone agrees, I hope.
* apparmor does *NOT* try to use __d_path() or any of its variants
when it sees that path->mnt is an internal vfsmount. In that case it's
definitely not mounted anywhere and dentry_path() is exactly what we want
there. Handling of sysctl()-triggered weirdness is moved to that place.
* if apparmor is asked to do pathname relative to chroot jail
and __d_path() tells it we it's not in that jail, the sucker just calls
d_absolute_path() instead. That's the other remaining caller of __d_path(),
BTW.
* seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway -
the normal seq_file logics will take care of growing the buffer and redoing
the call of ->show() just fine). However, if it gets path not reachable
from root, it returns SEQ_SKIP. The only caller adjusted (i.e. stopped
ignoring the return value as it used to do).
Reviewed-by: John Johansen <john.johansen@canonical.com>
ACKed-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ftrace: Fix hash record accounting bug
perf: Fix parsing of __print_flags() in TP_printk()
jump_label: jump_label_inc may return before the code is patched
ftrace: Remove force undef config value left for testing
tracing: Restore system filter behavior
tracing: fix event_subsystem ref counting
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Though not all events have field 'prev_pid', it was allowed to do this:
# echo 'prev_pid == 100' > events/sched/filter
but commit 75b8e98263fdb0bfbdeba60d4db463259f1fe8a2 (tracing/filter: Swap
entire filter of events) broke it without any reason.
Link: http://lkml.kernel.org/r/4EAF46CF.8040408@cn.fujitsu.com
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intr_remapping: Fix section mismatch in ir_dev_scope_init()
intel-iommu: Fix section mismatch in dmar_parse_rmrr_atsr_dev()
x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions
x86, AMD: Correct align_va_addr documentation
x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms
x86/mrst: Battery fixes
x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode
x86: Fix "Acer Aspire 1" reboot hang
x86/mtrr: Resolve inconsistency with Intel processor manual
x86: Document rdmsr_safe restrictions
x86, microcode: Fix the failure path of microcode update driver init code
Add TAINT_FIRMWARE_WORKAROUND on MTRR fixup
x86/mpparse: Account for bus types other than ISA and PCI
x86, mrst: Change the pmic_gpio device type to IPC
mrst: Added some platform data for the SFI translations
x86,mrst: Power control commands update
x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot
x86, UV: Fix UV2 hub part number
x86: Add user_mode_vm check in stack_overflow_check
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
northbridge functions
I've received complaints that the numa_node attribute for family
15h model 00-0fh (e.g. Interlagos) northbridge functions shows
-1 instead of the proper node ID.
Correct this with attached quirks (similar to quirks for other
AMD CPU families used in multi-socket systems).
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Frank Arnold <frank.arnold@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/20111202072143.GA31916@alberich.amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|\ \ \ \
| | |/ /
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix loss of notification with multi-event
perf, x86: Force IBS LVT offset assignment for family 10h
perf, x86: Disable PEBS on SandyBridge chips
trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter
perf session: Fix crash with invalid CPU list
perf python: Fix undefined symbol problem
perf/x86: Enable raw event access to Intel offcore events
perf: Don't use -ENOSPC for out of PMU resources
perf: Do not set task_ctx pointer in cpuctx if there are no events in the context
perf/x86: Fix PEBS instruction unwind
oprofile, x86: Fix crash when unloading module (nmi timer mode)
oprofile: Fix crash when unloading module (hr timer mode)
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When you do:
$ perf record -e cycles,cycles,cycles noploop 10
You expect about 10,000 samples for each event, i.e., 10s at
1000samples/sec. However, this is not what's happening. You
get much fewer samples, maybe 3700 samples/event:
$ perf report -D | tail -15
Aggregated stats:
TOTAL events: 10998
MMAP events: 66
COMM events: 2
SAMPLE events: 10930
cycles stats:
TOTAL events: 3644
SAMPLE events: 3644
cycles stats:
TOTAL events: 3642
SAMPLE events: 3642
cycles stats:
TOTAL events: 3644
SAMPLE events: 3644
On a Intel Nehalem or even AMD64, there are 4 counters capable
of measuring cycles, so there is plenty of space to measure those
events without multiplexing (even with the NMI watchdog active).
And even with multiplexing, we'd expect roughly the same number
of samples per event.
The root of the problem was that when the event that caused the buffer
to become full was not the first event passed on the cmdline, the user
notification would get lost. The notification was sent to the file
descriptor of the overflowed event but the perf tool was not polling
on it. The perf tool aggregates all samples into a single buffer,
i.e., the buffer of the first event. Consequently, it assumes
notifications for any event will come via that descriptor.
The seemingly straight forward solution of moving the waitq into the
ringbuffer object doesn't work because of life-time issues. One could
perf_event_set_output() on a fd that you're also blocking on and cause
the old rb object to be freed while its waitq would still be
referenced by the blocked thread -> FAIL.
Therefore link all events to the ringbuffer and broadcast the wakeup
from the ringbuffer object to all possible events that could be waited
upon. This is rather ugly, and we're open to better solutions but it
works for now.
Reported-by: Stephane Eranian <eranian@google.com>
Finished-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111126014731.GA7030@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|\ \ \ \
| |_|/ /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched, x86: Avoid unnecessary overflow in sched_clock
sched: Fix buglet in return_cfs_rq_runtime()
sched: Avoid SMT siblings in select_idle_sibling() if possible
sched: Set the command name of the idle tasks in SMP kernels
sched, rt: Provide means of disabling cross-cpu bandwidth sharing
sched: Document wait_for_completion_*() return values
sched_fair: Fix a typo in the comment describing update_sd_lb_stats
sched: Add a comment to effective_load() since it's a pain
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In UP systems, the idle task is initialized using the init_task
structure from which the command name is taken (currently "swapper").
In SMP systems, one idle task per CPU is forked by the worker thread
from which the task structure is copied. The command name is, therefore,
"kworker/0:0" or "kworker/0:1", if not updated. Since such update was
lacking, all idle tasks in SMP systems were incorrectly named. This
longtime bug was not discovered immediately, because there is no /proc/0
entry - the bug only becomes apparent when tracing is enabled.
This patch sets the command name of the idle tasks in SMP systems to the
name that is used in the INIT_TASK structure suffixed by a slash and the
number of the CPU.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111026211708.768925506@osadl.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
vmwgfx: integer overflow in vmw_kms_update_layout_ioctl()
drm/radeon/kms: fix 2D tiling CS support on EG/CM
drm/radeon/kms: fix scanout of 2D tiled buffers on EG/CM
drm: Fix lack of CRTC disable for drm_crtc_helper_set_config(.fb=NULL)
drm/radeon/kms: add some new pci ids
drm/radeon/kms: Skip ACPI call to ATIF when possible
drm/radeon/kms: Hide debugging message
drm/radeon/kms: add some loop timeouts in pageflip code
drm/nv50/disp: silence compiler warning
drm/nouveau: fix oopses caused by clear being called on unpopulated ttms
drm/nouveau: Keep RAMIN heap within the channel.
drm/nvd0/disp: fix sor dpms typo, preventing dpms on in some situations
drm/nvc0/gr: fix TP init for transform feedback offset queries
drm/nouveau: add dumb ioctl support
|
| | |/
| |/|
| | |
| | |
| | |
| | | |
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix S3/S4 problem on machines with VREF-pin mute-LED
ALSA: hda_intel - revert a quirk that affect VIA chipsets
ALSA: hda - Avoid touching mute-VREF pin for IDT codecs
firmware: Sigma: Fix endianess issues
firmware: Sigma: Skip header during CRC generation
firmware: Sigma: Prevent out of bounds memory access
ALSA: usb-audio - Support for Roland GAIA SH-01 Synthesizer
ASoC: Supply dcs_codes for newer WM1811 revisions
ASoC: Error out if we can't generate a LRCLK at all for WM8994
ASoC: Correct name of Speyside Main Speaker widget
ASoC: skip resume of soc-audio devices without codecs
ASoC: cs42l51: Fix off-by-one for reg_cache_size
ASoC: drop support for PlayPaq with WM8510
ASoC: mpc8610: tell the CS4270 codec that it's the master
ASoC: cs4720: use snd_soc_cache_sync()
ASoC: SAMSUNG: Fix build error
ASoC: max9877: Update register if either val or val2 is changed
ASoC: Fix wrong define for AD1836_ADC_WORD_OFFSET
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently the SigmaDSP firmware loader only works correctly on little-endian
systems. Fix this by using the proper endianess conversion functions.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@kernel.org
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The SigmaDSP firmware loader currently does not perform enough boundary size
checks when processing the firmware. As a result it is possible that a
malformed firmware can cause an out of bounds memory access.
This patch adds checks which ensure that both the action header and the payload
are completely inside the firmware data boundaries before processing them.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@kernel.org
|
|\ \ \ \
| |_|/ /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
netfilter: Remove ADVANCED dependency from NF_CONNTRACK_NETBIOS_NS
ipv4: flush route cache after change accept_local
sch_red: fix red_change
Revert "udp: remove redundant variable"
bridge: master device stuck in no-carrier state forever when in user-stp mode
ipv4: Perform peer validation on cached route lookup.
net/core: fix rollback handler in register_netdevice_notifier
sch_red: fix red_calc_qavg_from_idle_time
bonding: only use primary address for ARP
ipv4: fix lockdep splat in rt_cache_seq_show
sch_teql: fix lockdep splat
net: fec: Select the FEC driver by default for i.MX SoCs
isdn: avoid copying too long drvid
isdn: make sure strings are null terminated
netlabel: Fix build problems when IPv6 is not enabled
sctp: better integer overflow check in sctp_auth_create_key()
sctp: integer overflow in sctp_auth_create_key()
ipv6: Set mcast_hops to IPV6_DEFAULT_MCASTHOPS when -1 was given.
net: Fix corruption in /proc/*/net/dev_mcast
mac80211: fix race between the AGG SM and the Tx data path
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Since commit a4a710c4a7490587 (pkt_sched: Change PSCHED_SHIFT from 10 to
6) it seems RED/GRED are broken.
red_calc_qavg_from_idle_time() computes a delay in us units, but this
delay is now 16 times bigger than real delay, so the final qavg result
smaller than expected.
Use standard kernel time services since there is no need to obfuscate
them.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \ |
|