summaryrefslogtreecommitdiffstats
path: root/include/trace
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'trace-v4.20-rc3' of ↵Linus Torvalds2018-11-301-1/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "While rewriting the function graph tracer, I discovered a design flaw that was introduced by a patch that tried to fix one bug, but by doing so created another bug. As both bugs corrupt the output (but they do not crash the kernel), I decided to fix the design such that it could have both bugs fixed. The original fix, fixed time reporting of the function graph tracer when doing a max_depth of one. This was code that can test how much the kernel interferes with userspace. But in doing so, it could corrupt the time keeping of the function profiler. The issue is that the curr_ret_stack variable was being used for two different meanings. One was to keep track of the stack pointer on the ret_stack (shadow stack used by the function graph tracer), and the other use case was the graph call depth. Although, the two may be closely related, where they got updated was the issue that lead to the two different bugs that required the two use cases to be updated differently. The big issue with this fix is that it requires changing each architecture. The good news is, I was able to remove a lot of code that was duplicated within the architectures and place it into a single location. Then I could make the fix in one place. I pushed this code into linux-next to let it settle over a week, and before doing so, I cross compiled all the affected architectures to make sure that they built fine. In the mean time, I also pulled in a patch that fixes the sched_switch previous tasks state output, that was not actually correct" * tag 'trace-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: sched, trace: Fix prev_state output in sched_switch tracepoint function_graph: Have profiler use curr_ret_stack and not depth function_graph: Reverse the order of pushing the ret_stack and the callback function_graph: Move return callback before update of curr_ret_stack function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack function_graph: Make ftrace_push_return_trace() static sparc/function_graph: Simplify with function_graph_enter() sh/function_graph: Simplify with function_graph_enter() s390/function_graph: Simplify with function_graph_enter() riscv/function_graph: Simplify with function_graph_enter() powerpc/function_graph: Simplify with function_graph_enter() parisc: function_graph: Simplify with function_graph_enter() nds32: function_graph: Simplify with function_graph_enter() MIPS: function_graph: Simplify with function_graph_enter() microblaze: function_graph: Simplify with function_graph_enter() arm64: function_graph: Simplify with function_graph_enter() ARM: function_graph: Simplify with function_graph_enter() x86/function_graph: Simplify with function_graph_enter() function_graph: Create function_graph_enter() to consolidate architecture code
| * sched, trace: Fix prev_state output in sched_switch tracepointPavankumar Kondeti2018-11-271-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3f5fe9fef5b2 ("sched/debug: Fix task state recording/printout") tried to fix the problem introduced by a previous commit efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing"). However the prev_state output in sched_switch is still broken. task_state_index() uses fls() which considers the LSB as 1. Left shifting 1 by this value gives an incorrect mapping to the task state. Fix this by decrementing the value returned by __get_task_state() before shifting. Link: http://lkml.kernel.org/r/1540882473-1103-1-git-send-email-pkondeti@codeaurora.org Cc: stable@vger.kernel.org Fixes: 3f5fe9fef5b2 ("sched/debug: Fix task state recording/printout") Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2018-11-191-0/+2
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Fix some potentially uninitialized variables and use-after-free in kvaser_usb can drier, from Jimmy Assarsson. 2) Fix leaks in qed driver, from Denis Bolotin. 3) Socket leak in l2tp, from Xin Long. 4) RSS context allocation fix in bnxt_en from Michael Chan. 5) Fix cxgb4 build errors, from Ganesh Goudar. 6) Route leaks in ipv6 when removing exceptions, from Xin Long. 7) Memory leak in IDR allocation handling of act_pedit, from Davide Caratti. 8) Use-after-free of bridge vlan stats, from Nikolay Aleksandrov. 9) When MTU is locked, do not force DF bit on ipv4 tunnels. From Sabrina Dubroca. 10) When NAPI cached skb is reused, we must set it to the proper initial state which includes skb->pkt_type. From Eric Dumazet. 11) Lockdep and non-linear SKB handling fix in tipc from Jon Maloy. 12) Set RX queue properly in various tuntap receive paths, from Matthew Cover. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits) tuntap: fix multiqueue rx ipv6: Fix PMTU updates for UDP/raw sockets in presence of VRF tipc: don't assume linear buffer when reading ancillary data tipc: fix lockdep warning when reinitilaizing sockets net-gro: reset skb->pkt_type in napi_reuse_skb() tc-testing: tdc.py: Guard against lack of returncode in executed command tc-testing: tdc.py: ignore errors when decoding stdout/stderr ip_tunnel: don't force DF when MTU is locked MAINTAINERS: Add entry for CAKE qdisc net: bridge: fix vlan stats use-after-free on destruction socket: do a generic_file_splice_read when proto_ops has no splice_read net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs Revert "net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs" net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs net/sched: act_pedit: fix memory leak when IDR allocation fails net: lantiq: Fix returned value in case of error in 'xrx200_probe()' ipv6: fix a dst leak when removing its exception net: mvneta: Don't advertise 2.5G modes drivers/net/ethernet/qlogic/qed/qed_rdma.h: fix typo net/mlx4: Fix UBSAN warning of signed integer overflow ...
| * rxrpc: Fix life checkDavid Howells2018-11-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The life-checking function, which is used by kAFS to make sure that a call is still live in the event of a pending signal, only samples the received packet serial number counter; it doesn't actually provoke a change in the counter, rather relying on the server to happen to give us a packet in the time window. Fix this by adding a function to force a ping to be transmitted. kAFS then keeps track of whether there's been a stall, and if so, uses the new function to ping the server, resetting the timeout to allow the reply to come back. If there's a stall, a ping and the call is *still* stalled in the same place after another period, then the call will be aborted. Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Fixes: f4d15fb6f99a ("rxrpc: Provide functions for allowing cleaner handling of signals") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | kyber: fix wrong strlcpy() size in trace_kyber_latency()Omar Sandoval2018-11-121-4/+4
|/ | | | | | | | | | | | | When copying to the latency type, we should be passing LATENCY_TYPE_LEN, not DOMAIN_LEN (this isn't a problem in practice because we only pass "total" or "I/O"). Fix it by changing all of the strlcpy() calls to use sizeof(). Fixes: 6c3b7af1c975 ("kyber: add tracepoints") Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch> Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge branch 'work.afs' of ↵Linus Torvalds2018-11-011-18/+195
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "AFS series, with some iov_iter bits included" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) missing bits of "iov_iter: Separate type from direction and use accessor functions" afs: Probe multiple fileservers simultaneously afs: Fix callback handling afs: Eliminate the address pointer from the address list cursor afs: Allow dumping of server cursor on operation failure afs: Implement YFS support in the fs client afs: Expand data structure fields to support YFS afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Calc callback expiry in op reply delivery afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Implement the YFS cache manager service afs: Remove callback details from afs_callback_break struct afs: Commit the status on a new file/dir/symlink afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Don't invoke the server to read data beyond EOF afs: Add a couple of tracepoints to log I/O errors afs: Handle EIO from delivery function afs: Fix TTL on VL server and address lists afs: Implement VL server rotation afs: Improve FS server rotation error handling ...
| * Merge tag 'nfs-for-4.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsAl Viro2018-11-012-34/+21
| |\ | | | | | | | | | backmerge to do fixup of iov_iter_kvec() conflict
| * | afs: Probe multiple fileservers simultaneouslyDavid Howells2018-10-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Send probes to all the unprobed fileservers in a fileserver list on all addresses simultaneously in an attempt to find out the fastest route whilst not getting stuck for 20s on any server or address that we don't get a reply from. This alleviates the problem whereby attempting to access a new server can take a long time because the rotation algorithm ends up rotating through all servers and addresses until it finds one that responds. Signed-off-by: David Howells <dhowells@redhat.com>
| * | afs: Implement YFS support in the fs clientDavid Howells2018-10-241-1/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement support for talking to YFS-variant fileservers in the cache manager and the filesystem client. These implement upgraded services on the same port as their AFS services. YFS fileservers provide expanded capabilities over AFS. Signed-off-by: David Howells <dhowells@redhat.com>
| * | afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFSDavid Howells2018-10-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase the sizes of the volume ID to 64 bits and the vnode ID (inode number equivalent) to 96 bits to allow the support of YFS. This requires the iget comparator to check the vnode->fid rather than i_ino and i_generation as i_ino is not sufficiently capacious. It also requires this data to be placed into the vnode cache key for fscache. For the moment, just discard the top 32 bits of the vnode ID when returning it though stat. Signed-off-by: David Howells <dhowells@redhat.com>
| * | afs: Add a couple of tracepoints to log I/O errorsDavid Howells2018-10-241-0/+81
| | | | | | | | | | | | | | | | | | | | | Add a couple of tracepoints to log the production of I/O errors within the AFS filesystem. Signed-off-by: David Howells <dhowells@redhat.com>
| * | afs: Set up the iov_iter before calling afs_extract_data()David Howells2018-10-241-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | afs_extract_data sets up a temporary iov_iter and passes it to AF_RXRPC each time it is called to describe the remaining buffer to be filled. Instead: (1) Put an iterator in the afs_call struct. (2) Set the iterator for each marshalling stage to load data into the appropriate places. A number of convenience functions are provided to this end (eg. afs_extract_to_buf()). This iterator is then passed to afs_extract_data(). (3) Use the new ITER_DISCARD iterator to discard any excess data provided by FetchData. Signed-off-by: David Howells <dhowells@redhat.com>
| * | afs: Better tracing of protocol errorsDavid Howells2018-10-241-8/+46
| | | | | | | | | | | | | | | | | | | | | Include the site of detection of AFS protocol errors in trace lines to better be able to determine what went wrong. Signed-off-by: David Howells <dhowells@redhat.com>
* | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2018-10-261-0/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge updates from Andrew Morton: - a few misc things - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits) hugetlbfs: dirty pages as they are added to pagecache mm: export add_swap_extent() mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page() mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t mm/gup: cache dev_pagemap while pinning pages Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved" mm: return zero_resv_unavail optimization mm: zero remaining unavailable struct pages tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option tools/testing/selftests/vm/gup_benchmark.c: allow user specified file tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage mm/gup_benchmark.c: add additional pinning methods mm/gup_benchmark.c: time put_page() mm: don't raise MEMCG_OOM event due to failed high-order allocation mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock ...
| * | | mm: workingset: tell cache transitions from workingset thrashingJohannes Weiner2018-10-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refaults happen during transitions between workingsets as well as in-place thrashing. Knowing the difference between the two has a range of applications, including measuring the impact of memory shortage on the system performance, as well as the ability to smarter balance pressure between the filesystem cache and the swap-backed workingset. During workingset transitions, inactive cache refaults and pushes out established active cache. When that active cache isn't stale, however, and also ends up refaulting, that's bonafide thrashing. Introduce a new page flag that tells on eviction whether the page has been active or not in its lifetime. This bit is then stored in the shadow entry, to classify refaults as transitioning or thrashing. How many page->flags does this leave us with on 32-bit? 20 bits are always page flags 21 if you have an MMU 23 with the zone bits for DMA, Normal, HighMem, Movable 29 with the sparsemem section bits 30 if PAE is enabled 31 with this patch. So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes. If that's not enough, the system can switch to discontigmem and re-gain the 6 or 7 sparsemem section bits. Link: http://lkml.kernel.org/r/20180828172258.3185-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Drake <drake@endlessm.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'nfs-for-4.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2018-10-262-34/+21
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Fix the NFSv4.1 r/wsize sanity checking - Reset the RPC/RDMA credit grant properly after a disconnect - Fix a missed page unlock after pg_doio() Features and optimisations: - Overhaul of the RPC client socket code to eliminate a locking bottleneck and reduce the latency when transmitting lots of requests in parallel. - Allow parallelisation of the RPCSEC_GSS encoding of an RPC request. - Convert the RPC client socket receive code to use iovec_iter() for improved efficiency. - Convert several NFS and RPC lookup operations to use RCU instead of taking global locks. - Avoid the need for BH-safe locks in the RPC/RDMA back channel. Bugfixes and cleanups: - Fix lock recovery during NFSv4 delegation recalls - Fix the NFSv4 + NFSv4.1 "lookup revalidate + open file" case. - Fixes for the RPC connection metrics - Various RPC client layer cleanups to consolidate stream based sockets - RPC/RDMA connection cleanups - Simplify the RPC/RDMA cleanup after memory operation failures - Clean ups for NFS v4.2 copy completion and NFSv4 open state reclaim" * tag 'nfs-for-4.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (97 commits) SUNRPC: Convert the auth cred cache to use refcount_t SUNRPC: Convert auth creds to use refcount_t SUNRPC: Simplify lookup code SUNRPC: Clean up the AUTH cache code NFS: change sign of nfs_fh length sunrpc: safely reallow resvport min/max inversion nfs: remove redundant call to nfs_context_set_write_error() nfs: Fix a missed page unlock after pg_doio() SUNRPC: Fix a compile warning for cmpxchg64() NFSv4.x: fix lock recovery during delegation recall SUNRPC: use cmpxchg64() in gss_seq_send64_fetch_and_inc() xprtrdma: Squelch a sparse warning xprtrdma: Clean up xprt_rdma_disconnect_inject xprtrdma: Add documenting comments xprtrdma: Report when there were zero posted Receives xprtrdma: Move rb_flags initialization xprtrdma: Don't disable BH's in backchannel server xprtrdma: Remove memory address of "ep" from an error message xprtrdma: Rename rpcrdma_qp_async_error_upcall xprtrdma: Simplify RPC wake-ups on connect ...
| * | | Merge tag 'nfs-rdma-for-4.20-1' of ↵Trond Myklebust2018-10-181-9/+9
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/anna/linux-nfs NFS RDMA client updates for Linux 4.20 Stable bugfixes: - Reset credit grant properly after a disconnect Other bugfixes and cleanups: - xprt_release_rqst_cong is called outside of transport_lock - Create more MRs at a time and toss out old ones during recovery - Various improvements to the RDMA connection and disconnection code: - Improve naming of trace events, functions, and variables - Add documenting comments - Fix metrics and stats reporting - Fix a tracepoint sparse warning Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| | * | | xprtrdma: Squelch a sparse warningChuck Lever2018-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | linux/include/trace/events/rpcrdma.h:501:1: warning: expression using sizeof bool linux/include/trace/events/rpcrdma.h:501:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) Fixes: ab03eff58eb5 ("xprtrdma: Add trace points in RPC Call ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * | | xprtrdma: Rename rpcrdma_qp_async_error_upcallChuck Lever2018-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: Use a function name that is consistent with the RDMA core API and with other consumers. Because this is a function that is invoked from outside the rpcrdma.ko module, add an appropriate documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * | | xprtrdma: Rename rpcrdma_conn_upcallChuck Lever2018-10-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: Use a function name that is consistent with the RDMA core API and with other consumers. Because this is a function that is invoked from outside the rpcrdma.ko module, add an appropriate documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * | | xprtrdma: Name MR trace events consistentlyChuck Lever2018-10-021-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up the names of trace events related to MRs so that it's easy to enable these with a glob. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * | | xprtrdma: Explicitly resetting MRs is no longer necessaryChuck Lever2018-10-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a memory operation fails, the MR's driver state might not match its hardware state. The only reliable recourse is to dereg the MR. This is done in ->ro_recover_mr, which then attempts to allocate a fresh MR to replace the released MR. Since commit e2ac236c0b651 ("xprtrdma: Allocate MRs on demand"), xprtrdma dynamically allocates MRs. It can add more MRs whenever they are needed. That makes it possible to simply release an MR when a memory operation fails, instead of "recovering" it. It will automatically be replaced by the on-demand MR allocator. This commit is a little larger than I wanted, but it replaces ->ro_recover_mr, rb_recovery_lock, rb_recovery_worker, and the rb_stale_mrs list with a generic work queue. Since MRs are no longer orphaned, the mrs_orphaned metric is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive()Trond Myklebust2018-09-301-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for sharing with AF_LOCAL. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | | SUNRPC: Simplify TCP receive code by switching to using iteratorsTrond Myklebust2018-09-301-14/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of this code should also be reusable with other socket types. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | | | SUNRPC: Rename TCP receive-specific state variablesTrond Myklebust2018-09-301-5/+5
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we will want to introduce similar TCP state variables for the transmission of requests, let's rename the existing ones to label that they are for the receive side. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | / / block: add a report_zones methodChristoph Hellwig2018-10-251-1/+0
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dispatching a report zones command through the request queue is a major pain due to the command reply payload rewriting necessary. Given that blkdev_report_zones() is executing everything synchronously, implement report zones as a block device file operation instead, allowing major simplification of the code in many places. sd, null-blk, dm-linear and dm-flakey being the only block device drivers supporting exposing zoned block devices, these drivers are modified to provide the device side implementation of the report_zones() block device file operation. For device mappers, a new report_zones() target type operation is defined so that the upper block layer calls blkdev_report_zones() can be propagated down to the underlying devices of the dm targets. Implementation for this new operation is added to the dm-linear and dm-flakey targets. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [Damien] * Changed method block_device argument to gendisk * Various bug fixes and improvements * Added support for null_blk, dm-linear and dm-flakey. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | Merge tag 'ext4_for_linus' of ↵Linus Torvalds2018-10-241-22/+77
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - further restructure ext4 documentation - fix up ext4's delayed allocation for bigalloc file systems - fix up some syzbot-detected races in EXT4_IOC_MOVE_EXT, EXT4_IOC_SWAP_BOOT, and ext4_remount - ... and a few other miscellaneous bugs and optimizations. * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits) ext4: fix use-after-free race in ext4_remount()'s error path ext4: cache NULL when both default_acl and acl are NULL docs: promote the ext4 data structures book to top level docs: move ext4 administrative docs to admin-guide/ jbd2: fix use after free in jbd2_log_do_checkpoint() ext4: propagate error from dquot_initialize() in EXT4_IOC_FSSETXATTR ext4: fix setattr project check in fssetxattr ioctl docs: make ext4 readme tables readable docs: fix ext4 documentation table formatting problems docs: generate a separate ext4 pdf file from the documentation ext4: convert fault handler to use vm_fault_t type ext4: initialize retries variable in ext4_da_write_inline_data_begin() ext4: fix EXT4_IOC_SWAP_BOOT ext4: fix build error when DX_DEBUG is defined ext4: fix argument checking in EXT4_IOC_MOVE_EXT ext4: fix reserved cluster accounting at page invalidation time ext4: adjust reserved cluster count when removing extents ext4: reduce reserved cluster count by number of allocated clusters ext4: fix reserved cluster accounting at delayed write time ext4: add new pending reservation mechanism ...
| * | | ext4: adjust reserved cluster count when removing extentsEric Whitney2018-10-011-20/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify ext4_ext_remove_space() and the code it calls to correct the reserved cluster count for pending reservations (delayed allocated clusters shared with allocated blocks) when a block range is removed from the extent tree. Pending reservations may be found for the clusters at the ends of written or unwritten extents when a block range is removed. If a physical cluster at the end of an extent is freed, it's necessary to increment the reserved cluster count to maintain correct accounting if the corresponding logical cluster is shared with at least one delayed and unwritten extent as found in the extents status tree. Add a new function, ext4_rereserve_cluster(), to reapply a reservation on a delayed allocated cluster sharing blocks with a freed allocated cluster. To avoid ENOSPC on reservation, a flag is applied to ext4_free_blocks() to briefly defer updating the freeclusters counter when an allocated cluster is freed. This prevents another thread from allocating the freed block before the reservation can be reapplied. Redefine the partial cluster object as a struct to carry more state information and to clarify the code using it. Adjust the conditional code structure in ext4_ext_remove_space to reduce the indentation level in the main body of the code to improve readability. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | ext4: fix reserved cluster accounting at delayed write timeEric Whitney2018-10-011-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code in ext4_da_map_blocks sometimes reserves space for more delayed allocated clusters than it should, resulting in premature ENOSPC, exceeded quota, and inaccurate free space reporting. Fix this by checking for written and unwritten blocks shared in the same cluster with the newly delayed allocated block. A cluster reservation should not be made for a cluster for which physical space has already been allocated. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | ext4: generalize extents status tree search functionsEric Whitney2018-10-011-2/+2
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ext4 contains a few functions that are used to search for delayed extents or blocks in the extents status tree. Rather than duplicate code to add new functions to search for extents with different status values, such as written or a combination of delayed and unwritten, generalize the existing code to search for caller-specified extents status values. Also, move this code into extents_status.c where it is better associated with the data structures it operates upon, and where it can be more readily used to implement new extents status tree functions that might want a broader scope for i_es_lock. Three missing static specifiers in RFC version of patch reported and fixed by Fengguang Wu <fengguang.wu@intel.com>. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | | Merge tag 'for-4.20-part1-tag' of ↵Linus Torvalds2018-10-241-7/+29
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "This is the first batch with fixes and some nice performance improvements. Preliminary results show eg. more files/sec in fsmark, better perf on multi-threaded workloads (filebench, dbench), fewer context switches and overall better memory allocation characteristics (multiple benchmarks). Apart from general performance, there's an improvement for qgroups + balance workload that's been troubling our users. Note for stable: there are 20+ patches tagged for stable, out of 90. Not all of them apply cleanly on all stable versions but the conflicts are mostly due to simple cleanups and resolving should be obvious. The fixes are otherwise independent. Performance improvements: - transition between blocking and spinning modes of path is gone, which originally resulted to more unnecessary wakeups and updates to the path locks, the effects are measurable and improve latency and scalability - qgroups: first batch of changes that should speedup balancing with qgroups on, skip quota accounting on unchanged subtrees, overall gain is about 30+% in runtime - use rb-tree with cached first node for several structures, small improvement to avoid pointer chasing Fixes: - trim - fix: some blockgroups could have been missed if their logical address was past the total filesystem size (ie. after a lot of balancing) - better error reporting, after processing blockgroups and whole device - fix: continue trimming block groups after an error is encountered - check for trim support of the device earlier and avoid some unnecessary work - less interaction with transaction commit that improves latency on slower storage (eg. image files over NFS) - fsync - fix warning when replaying log after fsync of a O_TMPFILE - fix wrong dentries after fsync of file that got its parent replaced - qgroups: fix rescan that might misc some dirty groups - don't clean dirty pages during buffered writes, this could lead to lost updates in some corner cases - some block groups could have been delayed in creation, if the allocation triggered another one - error handling improvements Cleanups: - removed unused struct members and variables - function return type cleanups - delayed refs code refactoring - protect against deadlock that could be caused by crafted image that tries to allocate from a tree that's locked already" * tag 'for-4.20-part1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (93 commits) btrfs: switch return_bigger to bool in find_ref_head btrfs: remove fs_info from btrfs_should_throttle_delayed_refs btrfs: remove fs_info from btrfs_check_space_for_delayed_refs btrfs: delayed-ref: pass delayed_refs directly to btrfs_delayed_ref_lock btrfs: delayed-ref: pass delayed_refs directly to btrfs_select_ref_head btrfs: qgroup: move the qgroup->members check out from (!qgroup)'s else branch btrfs: relocation: Remove redundant tree level check btrfs: relocation: Cleanup while loop using rbtree_postorder_for_each_entry_safe btrfs: qgroup: Avoid calling qgroup functions if qgroup is not enabled Btrfs: fix wrong dentries after fsync of file that got its parent replaced Btrfs: fix warning when replaying log after fsync of a tmpfile btrfs: drop min_size from evict_refill_and_join btrfs: assert on non-empty delayed iputs btrfs: make sure we create all new block groups btrfs: reset max_extent_size on clear in a bitmap btrfs: protect space cache inode alloc with GFP_NOFS btrfs: release metadata before running delayed refs Btrfs: kill btrfs_clear_path_blocking btrfs: dev-replace: remove pointless assert in write unlock btrfs: dev-replace: move replace members out of fs_info ...
| * | | btrfs: qgroup: Introduce trace event to analyse the number of dirty extents ↵Qu Wenruo2018-10-151-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | accounted Number of qgroup dirty extents is directly linked to the performance overhead, so add a new trace event, trace_qgroup_num_dirty_extents(), to record how many dirty extents is processed in btrfs_qgroup_account_extents(). This will be pretty handy to analyze later balance performance improvement. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | btrfs: Remove 'objectid' member from struct btrfs_rootMisono Tomohiro2018-10-151-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two members in struct btrfs_root which indicate root's objectid: objectid and root_key.objectid. They are both set to the same value in __setup_root(): static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, u64 objectid) { ... root->objectid = objectid; ... root->root_key.objectid = objecitd; ... } and not changed to other value after initialization. grep in btrfs directory shows both are used in many places: $ grep -rI "root->root_key.objectid" | wc -l 133 $ grep -rI "root->objectid" | wc -l 55 (4.17, inc. some noise) It is confusing to have two similar variable names and it seems that there is no rule about which should be used in a certain case. Since ->root_key itself is needed for tree reloc tree, let's remove 'objecitd' member and unify code to use ->root_key.objectid in all places. Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | | Merge branch 'siginfo-linus' of ↵Linus Torvalds2018-10-241-4/+3
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo updates from Eric Biederman: "I have been slowly sorting out siginfo and this is the culmination of that work. The primary result is in several ways the signal infrastructure has been made less error prone. The code has been updated so that manually specifying SEND_SIG_FORCED is never necessary. The conversion to the new siginfo sending functions is now complete, which makes it difficult to send a signal without filling in the proper siginfo fields. At the tail end of the patchset comes the optimization of decreasing the size of struct siginfo in the kernel from 128 bytes to about 48 bytes on 64bit. The fundamental observation that enables this is by definition none of the known ways to use struct siginfo uses the extra bytes. This comes at the cost of a small user space observable difference. For the rare case of siginfo being injected into the kernel only what can be copied into kernel_siginfo is delivered to the destination, the rest of the bytes are set to 0. For cases where the signal and the si_code are known this is safe, because we know those bytes are not used. For cases where the signal and si_code combination is unknown the bits that won't fit into struct kernel_siginfo are tested to verify they are zero, and the send fails if they are not. I made an extensive search through userspace code and I could not find anything that would break because of the above change. If it turns out I did break something it will take just the revert of a single change to restore kernel_siginfo to the same size as userspace siginfo. Testing did reveal dependencies on preferring the signo passed to sigqueueinfo over si->signo, so bit the bullet and added the complexity necessary to handle that case. Testing also revealed bad things can happen if a negative signal number is passed into the system calls. Something no sane application will do but something a malicious program or a fuzzer might do. So I have fixed the code that performs the bounds checks to ensure negative signal numbers are handled" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits) signal: Guard against negative signal numbers in copy_siginfo_from_user32 signal: Guard against negative signal numbers in copy_siginfo_from_user signal: In sigqueueinfo prefer sig not si_signo signal: Use a smaller struct siginfo in the kernel signal: Distinguish between kernel_siginfo and siginfo signal: Introduce copy_siginfo_from_user and use it's return value signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE signal: Fail sigqueueinfo if si_signo != sig signal/sparc: Move EMT_TAGOVF into the generic siginfo.h signal/unicore32: Use force_sig_fault where appropriate signal/unicore32: Generate siginfo in ucs32_notify_die signal/unicore32: Use send_sig_fault where appropriate signal/arc: Use force_sig_fault where appropriate signal/arc: Push siginfo generation into unhandled_exception signal/ia64: Use force_sig_fault where appropriate signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn signal/ia64: Use the generic force_sigsegv in setup_frame signal/arm/kvm: Use send_sig_mceerr signal/arm: Use send_sig_fault where appropriate signal/arm: Use force_sig_fault where appropriate ...
| * | | | signal: Distinguish between kernel_siginfo and siginfoEric W. Biederman2018-10-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linus recently observed that if we did not worry about the padding member in struct siginfo it is only about 48 bytes, and 48 bytes is much nicer than 128 bytes for allocating on the stack and copying around in the kernel. The obvious thing of only adding the padding when userspace is including siginfo.h won't work as there are sigframe definitions in the kernel that embed struct siginfo. So split siginfo in two; kernel_siginfo and siginfo. Keeping the traditional name for the userspace definition. While the version that is used internally to the kernel and ultimately will not be padded to 128 bytes is called kernel_siginfo. The definition of struct kernel_siginfo I have put in include/signal_types.h A set of buildtime checks has been added to verify the two structures have the same field offsets. To make it easy to verify the change kernel_siginfo retains the same size as siginfo. The reduction in size comes in a following change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * | | | signal: Remove SEND_SIG_FORCEDEric W. Biederman2018-09-111-2/+1
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are no more users of SEND_SIG_FORCED so it may be safely removed. Remove the definition of SEND_SIG_FORCED, it's use in is_si_special, it's use in TP_STORE_SIGINFO, and it's use in __send_signal as without any users the uses of SEND_SIG_FORCED are now unncessary. This makes the code simpler, easier to understand and use. Users of signal sending functions now no longer need to ask themselves do I need to use SEND_SIG_FORCED. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2018-10-241-2/+5
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Add VF IPSEC offload support in ixgbe, from Shannon Nelson. 2) Add zero-copy AF_XDP support to i40e, from Björn Töpel. 3) All in-tree drivers are converted to {g,s}et_link_ksettings() so we can get rid of the {g,s}et_settings ethtool callbacks, from Michal Kubecek. 4) Add software timestamping to veth driver, from Michael Walle. 5) More work to make packet classifiers and actions lockless, from Vlad Buslov. 6) Support sticky FDB entries in bridge, from Nikolay Aleksandrov. 7) Add ipv6 version of IP_MULTICAST_ALL sockopt, from Andre Naujoks. 8) Support batching of XDP buffers in vhost_net, from Jason Wang. 9) Add flow dissector BPF hook, from Petar Penkov. 10) i40e vf --> generic iavf conversion, from Jesse Brandeburg. 11) Add NLA_REJECT netlink attribute policy type, to signal when users provide attributes in situations which don't make sense. From Johannes Berg. 12) Switch TCP and fair-queue scheduler over to earliest departure time model. From Eric Dumazet. 13) Improve guest receive performance by doing rx busy polling in tx path of vhost networking driver, from Tonghao Zhang. 14) Add per-cgroup local storage to bpf 15) Add reference tracking to BPF, from Joe Stringer. The verifier can now make sure that references taken to objects are properly released by the program. 16) Support in-place encryption in TLS, from Vakul Garg. 17) Add new taprio packet scheduler, from Vinicius Costa Gomes. 18) Lots of selftests additions, too numerous to mention one by one here but all of which are very much appreciated. 19) Support offloading of eBPF programs containing BPF to BPF calls in nfp driver, frm Quentin Monnet. 20) Move dpaa2_ptp driver out of staging, from Yangbo Lu. 21) Lots of u32 classifier cleanups and simplifications, from Al Viro. 22) Add new strict versions of netlink message parsers, and enable them for some situations. From David Ahern. 23) Evict neighbour entries on carrier down, also from David Ahern. 24) Support BPF sk_msg verdict programs with kTLS, from Daniel Borkmann and John Fastabend. 25) Add support for filtering route dumps, from David Ahern. 26) New igc Intel driver for 2.5G parts, from Sasha Neftin et al. 27) Allow vxlan enslavement to bridges in mlxsw driver, from Ido Schimmel. 28) Add queue and stack map types to eBPF, from Mauricio Vasquez B. 29) Add back byte-queue-limit support to r8169, with all the bug fixes in other areas of the driver it works now! From Florian Westphal and Heiner Kallweit. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2147 commits) tcp: add tcp_reset_xmit_timer() helper qed: Fix static checker warning Revert "be2net: remove desc field from be_eq_obj" Revert "net: simplify sock_poll_wait" net: socionext: Reset tx queue in ndo_stop net: socionext: Add dummy PHY register read in phy_write() net: socionext: Stop PHY before resetting netsec net: stmmac: Set OWN bit for jumbo frames arm64: dts: stratix10: Support Ethernet Jumbo frame tls: Add maintainers net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode octeontx2-af: Support for NIXLF's UCAST/PROMISC/ALLMULTI modes octeontx2-af: Support for setting MAC address octeontx2-af: Support for changing RSS algorithm octeontx2-af: NIX Rx flowkey configuration for RSS octeontx2-af: Install ucast and bcast pkt forwarding rules octeontx2-af: Add LMAC channel info to NIXLF_ALLOC response octeontx2-af: NPC MCAM and LDATA extract minimal configuration octeontx2-af: Enable packet length and csum validation octeontx2-af: Support for VTAG strip and capture ...
| * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-10-121-0/+1
| |\ \ \ | | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | Conflicts were easy to resolve using immediate context mostly, except the cls_u32.c one where I simply too the entire HEAD chunk. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-10-061-27/+0
| |\ \ \
| * \ \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-10-031-3/+1
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net' overlapped the renaming of a netlink attribute in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | tcp: expose sk_state in tcp_retransmit_skb tracepointYafang Shao2018-09-261-2/+5
| | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After sk_state exposed, we can get in which state this retransmission occurs. That could give us more detail for dignostic. For example, if this retransmission occurs in SYN_SENT state, it may also indicates that the syn packet may be dropped on the remote peer due to syn backlog queue full and then we could check the remote peer. BTW,SYNACK retransmission is traced in tcp_retransmit_synack tracepoint. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2018-10-231-3/+8
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes are: - Migrate CPU-intense 'misfit' tasks on asymmetric capacity systems, to better utilize (much) faster 'big core' CPUs. (Morten Rasmussen, Valentin Schneider) - Topology handling improvements, in particular when CPU capacity changes and related load-balancing fixes/improvements (Morten Rasmussen) - ... plus misc other improvements, fixes and updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) sched/completions/Documentation: Add recommendation for dynamic and ONSTACK completions sched/completions/Documentation: Clean up the document some more sched/completions/Documentation: Fix a couple of punctuation nits cpu/SMT: State SMT is disabled even with nosmt and without "=force" sched/core: Fix comment regarding nr_iowait_cpu() and get_iowait_load() sched/fair: Remove setting task's se->runnable_weight during PELT update sched/fair: Disable LB_BIAS by default sched/pelt: Fix warning and clean up IRQ PELT config sched/topology: Make local variables static sched/debug: Use symbolic names for task state constants sched/numa: Remove unused numa_stats::nr_running field sched/numa: Remove unused code from update_numa_stats() sched/debug: Explicitly cast sched_feat() to bool sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU capacity domains sched/fair: Don't move tasks to lower capacity CPUs unless necessary sched/fair: Set rq->rd->overload when misfit sched/fair: Wrap rq->rd->overload accesses with READ/WRITE_ONCE() sched/core: Change root_domain->overload type to int sched/fair: Change 'prefer_sibling' type to bool sched/fair: Kick nohz balance if rq->misfit_task_load ...
| * | | | | sched/debug: Use symbolic names for task state constantsUwe Kleine-König2018-09-101-3/+8
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | include/trace/events/sched.h includes <linux/sched.h> (via <linux/sched/numa_balancing.h>) and so knows about the TASK_* constants used to interpret .prev_state. So instead of duplicating the magic numbers make use of the defined macros to ease understanding the mapping from state bits to letters which isn't completely intuitive for an outsider. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel@pengutronix.de Link: http://lkml.kernel.org/r/20180905093636.24068-1-u.kleine-koenig@pengutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2018-10-231-13/+12
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The biggest change in this cycle is the conclusion of the big 'simplify RCU to two primary flavors' consolidation work - i.e. there's a single RCU flavor for any kernel variant (PREEMPT and !PREEMPT): - Consolidate the RCU-bh, RCU-preempt, and RCU-sched flavors into a single flavor similar to RCU-sched in !PREEMPT kernels and into a single flavor similar to RCU-preempt (but also waiting on preempt-disabled sequences of code) in PREEMPT kernels. This branch also includes a refactoring of rcu_{nmi,irq}_{enter,exit}() from Byungchul Park. - Now that there is only one RCU flavor in any given running kernel, the many "rsp" pointers are no longer required, and this cleanup series removes them. - This branch carries out additional cleanups made possible by the RCU flavor consolidation, including inlining now-trivial functions, updating comments and definitions, and removing now-unneeded rcutorture scenarios. - Now that there is only one flavor of RCU in any running kernel, there is also only on rcu_data structure per CPU. This means that the rcu_dynticks structure can be merged into the rcu_data structure, a task taken on by this branch. This branch also contains a -rt-related fix from Mike Galbraith. There were also other updates: - Documentation updates, including some good-eye catches from Joel Fernandes. - SRCU updates, most notably changes enabling call_srcu() to be invoked very early in the boot sequence. - Torture-test updates, including some preliminary work towards making rcutorture better able to find problems that result in insufficient grace-period forward progress. - Initial changes to RCU to better promote forward progress of grace periods, including fixing a bug found by Marius Hillenbrand and David Woodhouse, with the fix suggested by Peter Zijlstra" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits) srcu: Make early-boot call_srcu() reuse workqueue lists rcutorture: Test early boot call_srcu() srcu: Make call_srcu() available during very early boot rcu: Convert rcu_state.ofl_lock to raw_spinlock_t rcu: Remove obsolete ->dynticks_fqs and ->cond_resched_completed rcu: Switch ->dynticks to rcu_data structure, remove rcu_dynticks rcu: Switch dyntick nesting counters to rcu_data structure rcu: Switch urgent quiescent-state requests to rcu_data structure rcu: Switch lazy counts to rcu_data structure rcu: Switch last accelerate/advance to rcu_data structure rcu: Switch ->tick_nohz_enabled_snap to rcu_data structure rcu: Merge rcu_dynticks structure into rcu_data structure rcu: Remove unused rcu_dynticks_snap() from Tiny RCU rcu: Convert "1UL << x" to "BIT(x)" rcu: Avoid resched_cpu() when rescheduling the current CPU rcu: More aggressively enlist scheduler aid for nohz_full CPUs rcu: Compute jiffies_till_sched_qs from other kernel parameters rcu: Provide functions for determining if call_rcu() has been invoked rcu: Eliminate ->rcu_qs_ctr from the rcu_dynticks structure rcu: Motivate Tiny RCU forward progress ...
| * | | | | rcu: Eliminate ->rcu_qs_ctr from the rcu_dynticks structurePaul E. McKenney2018-08-301-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ->rcu_qs_ctr counter was intended to allow providing a lightweight report of a quiescent state to all RCU flavors. But now that there is only one flavor of RCU in any one running kernel, there is no point in having this feature. This commit therefore removes the ->rcu_qs_ctr field from the rcu_dynticks structure and the ->rcu_qs_ctr_snap field from the rcu_data structure. This results in the "rqc" option to the rcu_fqs trace event no longer being used, so this commit also removes the "rqc" description from the header comment. While in the neighborhood, this commit also causes the forward-progress request .rcu_need_heavy_qs be set one jiffies_till_sched_qs interval later in the grace period than the first setting of .rcu_urgent_qs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| * | | | | rcu: Inline _rcu_barrier() into its sole remaining callerPaul E. McKenney2018-08-301-10/+10
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because rcu_barrier() is a one-line wrapper function for _rcu_barrier() and because nothing else calls _rcu_barrier(), this commit inlines _rcu_barrier() into rcu_barrier(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | | | | Merge tag 'hwmon-for-v4.20' of ↵Linus Torvalds2018-10-231-0/+71
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: - Add support for trace events to hwmon core - Add support for NCT6797D, NCT6798D, MAX31725/6, LTM4686 - Support all AMD Family 15h Model 6xh and Model 7xh processors in k10temp driver - Convert ina3221 driver to _info API - Fixes, cleanups, and improvements in various drivers * tag 'hwmon-for-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (46 commits) hwmon: (pmbus) Fix page count auto-detection. hwmon: (pmbus) remove redundant 'default n' from Kconfig hwmon: (core) Add trace events to _attr_show/store functions hwmon: (ina3221) Use _info API to register hwmon device hwmon: (npcm-750-pwm-fan) Change initial pwm target to 255 hwmon: (ina3221) Validate shunt resistor value from DT hwmon: (tmp421) make const array 'names' static hwmon: (core) Add hwmon_in_enable attribute hwmon: (ina3221) mark PM functions as __maybe_unused hwmon: (ina3221) Read channel input source info from DT dt-bindings: hwmon: Add ina3221 documentation hwmon: (ina3221) Add suspend and resume functions hwmon: (ina3221) Fix INA3221_CONFIG_MODE macros hwmon: (ina3221) Add INA3221_CONFIG to volatile_table MAINTAINERS: Update PMBUS maintainer entry hwmon: (pwm-fan) Set fan speed to 0 on suspend hwmon: (pwm-fan) Silence error on probe deferral hwmon: (scpi-hwmon) remove redundant continue hwmon: (nct6775) Add support for NCT6798D hwmon: (nct6775) Add support for NCT6797D ...
| * | | | | hwmon: (core) Add trace events to _attr_show/store functionsNicolin Chen2018-10-111-0/+71
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trace events are useful for people who collect data from the Ftrace outputs. There're people who analyse the relationship of cpufreq, thermal and hwmon (power/voltage/current) using the convenient and timestamped Ftrace outputs, while unlike cpufreq and thermal subsystems the hwmon does not have trace events supported yet. So this patch adds initial trace events for the hwmon core. To call hwmon_attr_base() for aligned attr index numbers, it also moves the function upward. Ftrace outputs: ...: hwmon_attr_show_string: index=2, attr_name=in2_label, val=VDD_5V ...: hwmon_attr_show: index=2, attr_name=in2_input, val=5112 ...: hwmon_attr_show: index=2, attr_name=curr2_input, val=440 Note that the _attr_show and _attr_store functions are tied to the _with_info API. So a hwmon driver requiring the trace events feature should use _with_info API to register a hwmon device. Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
* | | | | Merge tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-blockLinus Torvalds2018-10-221-0/+96
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer updates from Jens Axboe: "This is the main pull request for block changes for 4.20. This contains: - Series enabling runtime PM for blk-mq (Bart). - Two pull requests from Christoph for NVMe, with items such as; - Better AEN tracking - Multipath improvements - RDMA fixes - Rework of FC for target removal - Fixes for issues identified by static checkers - Fabric cleanups, as prep for TCP transport - Various cleanups and bug fixes - Block merging cleanups (Christoph) - Conversion of drivers to generic DMA mapping API (Christoph) - Series fixing ref count issues with blkcg (Dennis) - Series improving BFQ heuristics (Paolo, et al) - Series improving heuristics for the Kyber IO scheduler (Omar) - Removal of dangerous bio_rewind_iter() API (Ming) - Apply single queue IPI redirection logic to blk-mq (Ming) - Set of fixes and improvements for bcache (Coly et al) - Series closing a hotplug race with sysfs group attributes (Hannes) - Set of patches for lightnvm: - pblk trace support (Hans) - SPDX license header update (Javier) - Tons of refactoring patches to cleanly abstract the 1.2 and 2.0 specs behind a common core interface. (Javier, Matias) - Enable pblk to use a common interface to retrieve chunk metadata (Matias) - Bug fixes (Various) - Set of fixes and updates to the blk IO latency target (Josef) - blk-mq queue number updates fixes (Jianchao) - Convert a bunch of drivers from the old legacy IO interface to blk-mq. This will conclude with the removal of the legacy IO interface itself in 4.21, with the rest of the drivers (me, Omar) - Removal of the DAC960 driver. The SCSI tree will introduce two replacement drivers for this (Hannes)" * tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-block: (204 commits) block: setup bounce bio_sets properly blkcg: reassociate bios when make_request() is called recursively blkcg: fix edge case for blk_get_rl() under memory pressure nvme-fabrics: move controller options matching to fabrics nvme-rdma: always have a valid trsvcid mtip32xx: fully switch to the generic DMA API rsxx: switch to the generic DMA API umem: switch to the generic DMA API sx8: switch to the generic DMA API sx8: remove dead IF_64BIT_DMA_IS_POSSIBLE code skd: switch to the generic DMA API ubd: remove use of blk_rq_map_sg nvme-pci: remove duplicate check drivers/block: Remove DAC960 driver nvme-pci: fix hot removal during error handling nvmet-fcloop: suppress a compiler warning nvme-core: make implicit seed truncation explicit nvmet-fc: fix kernel-doc headers nvme-fc: rework the request initialization code nvme-fc: introduce struct nvme_fcp_op_w_sgl ...
| * | | | kyber: add tracepointsOmar Sandoval2018-09-271-0/+96
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When debugging Kyber, it's really useful to know what latencies we've been having, how the domain depths have been adjusted, and if we've actually been throttling. Add three tracepoints, kyber_latency, kyber_adjust, and kyber_throttled, to record that. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>