summaryrefslogtreecommitdiffstats
path: root/net/rds/bind.c
Commit message (Collapse)AuthorAgeFilesLines
* rds; Reset rs->rs_bound_addr in rds_add_bound() failure pathSowmini Varadhan2017-12-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the rds_sock is not added to the bind_hash_table, we must reset rs_bound_addr so that rds_remove_bound will not trip on this rds_sock. rds_add_bound() does a rds_sock_put() in this failure path, so failing to reset rs_bound_addr will result in a socket refcount bug, and will trigger a WARN_ON with the stack shown below when the application subsequently tries to close the PF_RDS socket. WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \ rds_sock_destruct+0x15/0x30 [rds] : __sk_destruct+0x21/0x190 rds_remove_bound.part.13+0xb6/0x140 [rds] rds_release+0x71/0x120 [rds] sock_release+0x1a/0x70 sock_close+0xe/0x20 __fput+0xd5/0x210 task_work_run+0x82/0xa0 do_exit+0x2ce/0xb30 ? syscall_trace_enter+0x1cc/0x2b0 do_group_exit+0x39/0xa0 SyS_exit_group+0x10/0x10 do_syscall_64+0x61/0x1a0 Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RDS: make rhashtable_params constBhumika Goyal2017-08-281-1/+1
| | | | | | | | | Make this const as it is either used during a copy operation or passed to a const argument of the function rhltable_init Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RDS: log the address on bind failureSantosh Shilimkar2017-01-021-2/+2
| | | | | | | | | It's useful to know the IP address when RDS fails to bind a connection. Thus, adding it to the error message. Orabug: 21894138 Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
* RDS: TCP: Enable multipath RDS for TCPSowmini Varadhan2016-07-151-0/+6
| | | | | | | | | | | | Use RDS probe-ping to compute how many paths may be used with the peer, and to synchronously start the multiple paths. If mprds is supported, hash outgoing traffic to one of multiple paths in rds_sendmsg() when multipath RDS is supported by the transport. CC: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RDS: convert bind hash table to re-sizable hashtablesantosh.shilimkar@oracle.com2015-11-021-82/+43
| | | | | | | | | | | | | | To further improve the RDS connection scalabilty on massive systems where number of sockets grows into tens of thousands of sockets, there is a need of larger bind hashtable. Pre-allocated 8K or 16K table is not very flexible in terms of memory utilisation. The rhashtable infrastructure gives us the flexibility to grow the hashtbable based on use and also comes up with inbuilt efficient bucket(chain) handling. Reviewed-by: David Miller <davem@davemloft.net> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports.Sowmini Varadhan2015-10-131-1/+8
| | | | | | | | | | | | | The IP address passed to rds_bind() should be vetted by the transport's ->laddr_check() for a previously bound transport. This needs to be done to avoid cases where, for example, the application has asked for an IB transport, but the IP address passed to bind is only usable on ethernet interfaces. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RDS: Use per-bucket rw lock for bind hash-tableSantosh Shilimkar2015-09-301-15/+32
| | | | | | | | | | | | | | | | | | | One global lock protecting hash-tables with 1024 buckets isn't efficient and it shows up in a massive systems with truck loads of RDS sockets serving multiple databases. The perf data clearly highlights the contention on the rw lock in these massive workloads. When the contention gets worse, the code gets into a state where it decides to back off on the lock. So while it has disabled interrupts, it sits and backs off on this lock get. This causes the system to become sluggish and eventually all sorts of bad things happen. The simple fix is to move the lock into the hash bucket and use per-bucket lock to improve the scalability. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
* RDS: fix rds_sock reference bug while doing bindSantosh Shilimkar2015-09-301-5/+11
| | | | | | | | | One need to take rds socket reference while using it and release it once done with it. rds_add_bind() code path does not do that so lets fix it. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
* RDS: make socket bind/release locking scheme simple and more efficientSantosh Shilimkar2015-09-301-20/+15
| | | | | | | | | | | | | | | | | | | | | RDS bind and release locking scheme is very inefficient. It uses RCU for maintaining the bind hash-table which is great but it also needs to hold spinlock for [add/remove]_bound(). So overall usecase, the hash-table concurrent speedup doesn't pay off. In fact blocking nature of synchronize_rcu() makes the RDS socket shutdown too slow which hurts RDS performance since connection shutdown and re-connect happens quite often to maintain the RC part of the protocol. So we make the locking scheme simpler and more efficient by replacing spin_locks with reader/writer locks and getting rid off rcu for bind hash-table. In subsequent patch, we also covert the global lock with per-bucket lock to reduce the global lock contention. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
* RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than ↵Sowmini Varadhan2015-08-071-1/+2
| | | | | | | | | | | init_net Open the sockets calling sock_create_kern() with the correct struct net pointer, and use that struct net pointer when verifying the address passed to rds_bind(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/rds: Add setsockopt support for SO_RDS_TRANSPORTSowmini Varadhan2015-05-311-0/+4
| | | | | | | | | | | | | | An application may deterministically attach the underlying transport for a PF_RDS socket by invoking setsockopt(2) with the SO_RDS_TRANSPORT option at the SOL_RDS level. The integer argument to setsockopt must be one of the RDS_TRANS_* transport types, e.g., RDS_TRANS_TCP. The option must be specified before invoking bind(2) on the socket, and may only be used once on the socket. An attempt to set the option on a bound socket, or to invoke the option after a successful SO_RDS_TRANSPORT attachment, will return EOPNOTSUPP. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: replace macros net_random and net_srandom with direct calls to prandomAruna-Hewapathirane2014-01-141-1/+1
| | | | | | | | | | | | This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* hlist: drop the node parameter from iteratorsSasha Levin2013-02-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* net/rds: use prink_ratelimited() instead of printk_ratelimit()Manuel Zerpies2011-06-171-2/+2
| | | | | | | | Since printk_ratelimit() shouldn't be used anymore (see comment in include/linux/printk.h), replace it with printk_ratelimited() Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
* rds: Use RCU for the bind lookup searchesChris Mason2010-09-081-44/+46
| | | | | | | | | | | The RDS bind lookups are somewhat expensive in terms of CPU time and locking overhead. This commit changes them into a faster RCU based hash tree instead of the rbtrees they were using before. On large NUMA systems it is a significant improvement. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* rds: switch to rwlock on bind_lockChris Mason2010-09-081-7/+7
| | | | | | | | | | | | | The bind_lock is almost entirely readonly, but it gets hammered during normal operations and is a major bottleneck. This commit changes it to an rwlock, which takes it from 80% of the system time on a big numa machine down to much lower numbers. A better fix would involve RCU, which is done in a later commit Signed-off-by: Chris Mason <chris.mason@oracle.com>
* RDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisonsAndy Grover2010-09-081-2/+2
| | | | | | Favor "if (foo)" style over "if (foo != NULL)". Signed-off-by: Andy Grover <andy.grover@oracle.com>
* RDS: Add a debug message suggesting to load transport modulesAndy Grover2009-08-231-0/+3
| | | | | | | | | Now that RDS transports are no longer compiled-in to RDS core, there is now the possibility that they will not be loaded. This adds a helpful suggestion when rds_bind() fails to find a transport. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RDS: Socket interfaceAndy Grover2009-02-261-0/+199
Implement the RDS (Reliable Datagram Sockets) interface. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>