summaryrefslogtreecommitdiffstats
path: root/net/ipv4/tcp_probe.c
Commit message (Collapse)AuthorAgeFilesLines
* net: tcp: Remove TCP probe moduleMasami Hiramatsu2018-01-021-301/+0
| | | | | | | | | Remove TCP probe module since jprobe has been deprecated. That function is now replaced by tcp/tcp_probe trace-event. You can use it via ftrace or perftools. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: remove redundant argument from tcp_rcv_established()Matvejchikov Ilya2017-07-241-2/+3
| | | | | | | | | The last (4th) argument of tcp_rcv_established() is redundant as it always equals to skb->len and the skb itself is always passed as 2th agrument. There is no reason to have it. Signed-off-by: Ilya V. Matveychikov <matvejchikov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"Eric Dumazet2017-02-211-2/+2
| | | | | | | | | | | | | | | | This reverts commit e70ac171658679ecf6bea4bbd9e9325cd6079d2b. jtcp_rcv_established() is in fact called with hard irq being disabled. Initial bug report from Ricardo Nabinger Sanchez [1] still needs to be investigated, but does not look like a TCP bug. [1] https://www.spinics.net/lists/netdev/msg420960.html Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kernel test robot <xiaolong.ye@intel.com> Cc: Ricardo Nabinger Sanchez <rnsanchez@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: tcp_probe: use spin_lock_bh()Eric Dumazet2017-02-141-2/+2
| | | | | | | | | | | | tcp_rcv_established() can now run in process context. We need to disable BH while acquiring tcp probe spinlock, or risk a deadlock. Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Ricardo Nabinger Sanchez <rnsanchez@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ipv4: tcp_probe: Replace timespec with timespec64Deepa Dinamani2016-03-011-4/+4
| | | | | | | | | | | | | | | | | | | | TCP probe log timestamps use struct timespec which is not y2038 safe. Even though timespec might be good enough here as it is used to represent delta time, the plan is to get rid of all uses of timespec in the kernel. Replace with struct timespec64 which is y2038 safe. Prints still use unsigned long format and type. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: whitespace fixesstephen hemminger2014-09-011-4/+2
| | | | | | | | Fix places where there is space before tab, long lines, and awkward if(){, double spacing etc. Add blank line after declaration/initialization. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: switch rtt estimations to usec resolutionEric Dumazet2014-02-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upcoming congestion controls for TCP require usec resolution for RTT estimations. Millisecond resolution is simply not enough these days. FQ/pacing in DC environments also require this change for finer control and removal of bimodal behavior due to the current hack in tcp_update_pacing_rate() for 'small rtt' TCP_CONG_RTT_STAMP is no longer needed. As Julian Anastasov pointed out, we need to keep user compatibility : tcp_metrics used to export RTT and RTTVAR in msec resolution, so we added RTT_US and RTTVAR_US. An iproute2 patch is needed to use the new attributes if provided by the kernel. In this example ss command displays a srtt of 32 usecs (10Gbit link) lpk51:~# ./ss -i dst lpk52 Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port tcp ESTAB 0 1 10.246.11.51:42959 10.246.11.52:64614 cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448 cwnd:10 send 3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559 Updated iproute2 ip command displays : lpk51:~# ./ip tcp_metrics | grep 10.246.11.52 10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source 10.246.11.51 Old binary displays : lpk51:~# ip tcp_metrics | grep 10.246.11.52 10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source 10.246.11.51 With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Larry Brakmo <brakmo@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: ERROR: do not initialise globals to 0 or NULLWeilong Chen2013-12-261-2/+2
| | | | | Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: make lookups simpler and fasterEric Dumazet2013-10-091-18/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP listener refactoring, part 4 : To speed up inet lookups, we moved IPv4 addresses from inet to struct sock_common Now is time to do the same for IPv6, because it permits us to have fast lookups for all kind of sockets, including upcoming SYN_RECV. Getting IPv6 addresses in TCP lookups currently requires two extra cache lines, plus a dereference (and memory stall). inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6 This patch is way bigger than its IPv4 counter part, because for IPv4, we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6, it's not doable easily. inet6_sk(sk)->daddr becomes sk->sk_v6_daddr inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr at the same offset. We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic macro. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: Change return value of tcp_rcv_established()Vijay Subramanian2013-09-041-3/+2
| | | | | | | | | | | | | | | tcp_rcv_established() returns only one value namely 0. We change the return value to void (as suggested by David Miller). After commit 0c24604b (tcp: implement RFC 5961 4.2), we no longer send RSTs in response to SYNs. We can remove the check and processing on the return value of tcp_rcv_established(). We also fix jtcp_rcv_established() in tcp_probe.c to match that of tcp_rcv_established(). Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: tcp_probe: adapt tbuf size for recent changesDaniel Borkmann2013-09-041-1/+1
| | | | | | | | | | | | | | With recent changes in tcp_probe module (e.g. f925d0a62d ("net: tcp_probe: add IPv6 support")) we also need to take into account that tbuf needs to be updated as format string will be further expanded. tbuf sits on the stack in tcpprobe_read() function that is invoked when user space reads procfs file /proc/net/tcpprobe, hence not fast path as in jtcp_rcv_established(). Having a size similarly as in sctp_probe module of 256 bytes is fully sufficient for that, we need theoretical maximum of 252 bytes otherwise we could get truncated. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: tcp_probe: allow more advanced ingress filtering by markDaniel Borkmann2013-08-271-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the tcp_probe snooper can either filter packets by a given port (handed to the module via module parameter e.g. port=80) or lets all TCP traffic pass (port=0, default). When a port is specified, the port number is tested against the sk's source/destination port. Thus, if one of them matches, the information will be further processed for the log. As this is quite limited, allow for more advanced filtering possibilities which can facilitate debugging/analysis with the help of the tcp_probe snooper. Therefore, similarly as added to BPF machine in commit 7e75f93e ("pkt_sched: ingress socket filter by mark"), add the possibility to use skb->mark as a filter. If the mark is not being used otherwise, this allows ingress filtering by flow (e.g. in order to track updates from only a single flow, or a subset of all flows for a given port) and other things such as dynamic logging and reconfiguration without removing/re-inserting the tcp_probe module, etc. Simple example: insmod net/ipv4/tcp_probe.ko fwmark=8888 full=1 ... iptables -A INPUT -i eth4 -t mangle -p tcp --dport 22 \ --sport 60952 -j MARK --set-mark 8888 [... sampling interval ...] iptables -D INPUT -i eth4 -t mangle -p tcp --dport 22 \ --sport 60952 -j MARK --set-mark 8888 The current option to filter by a given port is still being preserved. A similar approach could be done for the sctp_probe module as a follow-up. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: tcp_probe: add IPv6 supportDaniel Borkmann2013-08-221-10/+44
| | | | | | | | | | | | The tcp_probe currently only supports analysis of IPv4 connections. Therefore, it would be nice to have IPv6 supported as well. Since we have the recently added %pISpc specifier that is IPv4/IPv6 generic, build related sockaddress structures from the flow information and pass this to our format string. Tested with SSH and HTTP sessions on IPv4 and IPv6. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: tcp_probe: kprobes: adapt jtcp_rcv_established signatureDaniel Borkmann2013-08-221-1/+8
| | | | | | | | | | | | This patches fixes a rather unproblematic function signature mismatch as the const specifier was missing for the th variable; and next to that it adds a build-time assertion so that future function signature mismatches for kprobes will not end badly, similarly as commit 22222997 ("net: sctp: add build check for sctp_sf_eat_sack_6_2/jsctp_sf_eat_sack") did it for SCTP. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: tcp_probe: also include rcv_wnd next to snd_wndDaniel Borkmann2013-08-221-2/+4
| | | | | | | | | | | | It is helpful to sometimes know the TCP window sizes of an established socket e.g. to confirm that window scaling is working or to tweak the window size to improve high-latency connections, etc etc. Currently the TCP snooper only exports the send window size, but not the receive window size. Therefore, also add the receive window size to the end of the output line. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: proc: change proc_net_remove to remove_proc_entryGao feng2013-02-181-2/+2
| | | | | | | | | | | | | | proc_net_remove is only used to remove proc entries that under /proc/net,it's not a general function for removing proc entries of netns. if we want to remove some proc entries which under /proc/net/stat/, we still need to call remove_proc_entry. this patch use remove_proc_entry to replace proc_net_remove. we can remove proc_net_remove after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: proc: change proc_net_fops_create to proc_createGao feng2013-02-181-1/+1
| | | | | | | | | | | | | Right now, some modules such as bonding use proc_create to create proc entries under /proc/net/, and other modules such as ipv4 use proc_net_fops_create. It looks a little chaos.this patch changes all of proc_net_fops_create to proc_create. we can remove proc_net_fops_create after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cleanup unsigned to unsigned intEric Dumazet2012-04-151-1/+1
| | | | | | | Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: fix checkpatch errorsDaniel Baluta2012-04-151-1/+1
| | | | | | | | | Fix checkpatch errors of the following type: * ERROR: "foo * bar" should be "foo *bar" * ERROR: "(foo*)" should be "(foo *)" Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ipv4: Standardize prefixes for message loggingJoe Perches2012-03-121-1/+3
| | | | | | | | | | | | | | | | Add #define pr_fmt(fmt) as appropriate. Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files. Standardize on "UDPLite: " for appropriate uses. Some prefixes were previously "UDPLITE: " and "UDP-Lite: ". Add KBUILD_MODNAME ": " to icmp and gre. Remove embedded prefixes as appropriate. Add missing "\n" to pr_info in gre.c. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ipv4: tcp_probe: cleanup snprintf() useVasiliy Kulikov2010-11-171-2/+2
| | | | | | | | | | | | | | | | snprintf() returns number of bytes that were copied if there is no overflow. This code uses return value as number of copied bytes. Theoretically format string '%lu.%09lu %pI4:%u %pI4:%u %d %#x %#x %u %u %u %u\n' may be expanded up to 163 bytes. In reality tv.tv_sec is just few bytes instead of 20, 2 ports are just 5 bytes each instead of 10, length is 5 bytes instead of 10. The rest is an unstrusted input. Theoretically if tv_sec is big then copy_to_user() would overflow tbuf. tbuf was increased to fit in 163 bytes. snprintf() is used to follow return value semantic. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* llseek: automatically add .llseek fopArnd Bergmann2010-10-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* tcp_probe: avoid modulus operation and wrap fixStephen Hemminger2010-01-251-9/+10
| | | | | | | | | By rounding up the buffer size to power of 2, several expensive modulus operations can be avoided. This patch also solves a bug where the gap need when ring gets full was not being accounted for. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/ipv4: Move && and || to end of previous lineJoe Perches2009-11-231-2/+2
| | | | | | | | | | | | | | | | | | On Sun, 2009-11-22 at 16:31 -0800, David Miller wrote: > It should be of the form: > if (x && > y) > > or: > if (x && y) > > Fix patches, rather than complaints, for existing cases where things > do not follow this pattern are certainly welcome. Also collapsed some multiple tabs to single space. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: rename some inet_sock fieldsEric Dumazet2009-10-181-5/+6
| | | | | | | | | | | | | | | | In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: '< 0' test on unsignedRoel Kluin2009-03-131-2/+3
| | | | | | | promote 'cnt' to size_t, to match 'len'. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: replace NIPQUAD() in net/ipv4/ net/ipv6/Harvey Harrison2008-10-311-4/+3
| | | | | | | | Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: correct kcalloc usageMilton Miller2008-07-101-1/+1
| | | | | | | | kcalloc is supposed to be called with the count as its first argument and the element size as the second. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: tcp_probe buffer overflow and incorrect return valueTom Quetchenbach2008-04-241-5/+4
| | | | | | | | | | | | | | | | | | | | | tcp_probe has a bounds-checking bug that causes many programs (less, python) to crash reading /proc/net/tcp_probe. When it outputs a log line to the reader, it only checks if that line alone will fit in the reader's buffer, rather than that line and all the previous lines it has already written. tcpprobe_read also returns the wrong value if copy_to_user fails--it just passes on the return value of copy_to_user (number of bytes not copied), which makes a failure look like a success. This patch fixes the buffer overflow and sets the return value to -EFAULT if copy_to_user fails. Patch is against latest net-2.6; tested briefly and seems to fix the crashes in less and python. Signed-off-by: Tom Quetchenbach <virtualphtn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Use NIPQUAD_FMT to format ipv4 addresses.YOSHIFUJI Hideaki2008-04-141-1/+1
| | | | | | | And use %u to format port. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make /proc/net per network namespaceEric W. Biederman2007-10-101-3/+4
| | | | | | | | | | | | | | | | | | This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* jprobes: remove JPROBE_ENTRY()Michael Ellerman2007-07-191-1/+1
| | | | | | | | | | | | | | | | AFAICT now that jprobe.entry is a void *, JPROBE_ENTRY doesn't do anything useful - so remove it .. I've left a do-nothing version so that out-of-tree jprobes code will still compile without modifications. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [TCP]: tcp probe add back ssthresh fieldStephen Hemminger2007-07-141-0/+1
| | | | | | | Sangtae noticed the ssthresh got missed. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: tcp probe wraparound handling and other changesStephen Hemminger2007-07-111-70/+124
| | | | | | | | | | | | | Switch from formatting messages in probe routine and copying with kfifo, to using a small circular queue of information and formatting on read. This avoids wraparound issues with kfifo, and saves one copy. Also make sure to state correct license, rather than copying off some other driver I started with. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] tcp_probe: Attach printf attribute properly to printl().David S. Miller2007-06-071-2/+4
| | | | | | | | GCC doesn't like the way Stephen initially did it: net/ipv4/tcp_probe.c:83: warning: empty declaration Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] tcp_probe: use GCC printf attributeStephen Hemminger2007-05-311-1/+2
| | | | | | | The function in tcp_probe is printf like, use GCC to check the args. Sighed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] tcp_probe: a trivial fix for mismatched number of printl arguments.Sangtae Ha2007-05-311-1/+1
| | | | | | | | Just a fix to correct the number of printl arguments. Now, srtt is logging correctly. Signed-off-by: Sangtae Ha <sangtae.ha@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] tcp_probe: improvements for net-2.6.22Stephen Hemminger2007-04-251-29/+39
| | | | | | | | Change tcp_probe to use ktime (needed to add one export). Add option to only get events when cwnd changes - from Doug Leith Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] mark struct file_operations const 7Arjan van de Ven2007-02-121-1/+1
| | | | | | | | | | | Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] email change for shemminger@osdl.orgStephen Hemminger2007-01-231-1/+1
| | | | | | | | | Change my email address to reflect OSDL merger. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> [ The irony. Somebody still has his sign-off message hardcoded in a script or his brainstem ;^] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [NET]: Fix kfifo_alloc() error check.Akinobu Mita2006-11-251-0/+2
| | | | | | | The return value of kfifo_alloc() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] Kprobes: Make kprobe modules more portableAnanth N Mavinakayanahalli2006-10-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In an effort to make kprobe modules more portable, here is a patch that: o Introduces the "symbol_name" field to struct kprobe. The symbol->address resolution now happens in the kernel in an architecture agnostic manner. 64-bit powerpc users no longer have to specify the ".symbols" o Introduces the "offset" field to struct kprobe to allow a user to specify an offset into a symbol. o The legacy mechanism of specifying the kprobe.addr is still supported. However, if both the kprobe.addr and kprobe.symbol_name are specified, probe registration fails with an -EINVAL. o The symbol resolution code uses kallsyms_lookup_name(). So CONFIG_KPROBES now depends on CONFIG_KALLSYMS o Apparantly kprobe modules were the only legitimate out-of-tree user of the kallsyms_lookup_name() EXPORT. Now that the symbol resolution happens in-kernel, remove the EXPORT as suggested by Christoph Hellwig o Modify tcp_probe.c that uses the kprobe interface so as to make it work on multiple platforms (in its earlier form, the code wouldn't work, say, on powerpc) Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [TCP]: Fix botched memory leak fix to tcpprobe_read().David S. Miller2006-08-131-1/+2
| | | | | | | | | | | | Somehow I clobbered James's original fix and only my subsequent compiler warning change went in for that changeset. Get the real fix in there. Noticed by Jesper Juhl. Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: fix memory leak in net/ipv4/tcp_probe.c::tcpprobe_read()James Morris2006-08-021-1/+1
| | | | | | | | | Based upon a patch by Jesper Juhl. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fix compile warning in tcp_probe.cDavid S. Miller2006-06-171-1/+3
| | | | | | | | The suseconds_t et al. are not necessarily any particular type on every platform, so cast to unsigned long so that we can use one printf format string and avoid warnings across the board Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: TCP Probe congestion window tracingStephen Hemminger2006-06-171-0/+179
This adds a new module for tracking TCP state variables non-intrusively using kprobes. It has a simple /proc interface that outputs one line for each packet received. A sample usage is to collect congestion window and ssthresh over time graphs. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>