summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methodsTejun Heo2013-08-082-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup * in subsystem implementations for the following reasons. * With unified hierarchy, subsystems will be dynamically bound and unbound from cgroups and thus css's (cgroup_subsys_state) may be created and destroyed dynamically over the lifetime of a cgroup, which is different from the current state where all css's are allocated and destroyed together with the associated cgroup. This in turn means that cgroup_css() should be synchronized and may return NULL, making it more cumbersome to use. * Differing levels of per-subsystem granularity in the unified hierarchy means that the task and descendant iterators should behave differently depending on the specific subsystem the iteration is being performed for. * In majority of the cases, subsystems only care about its part in the cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods often obtain the matching css pointer from the cgroup and don't bother with the cgroup pointer itself. Passing around css fits much better. This patch converts all cgroup_subsys methods to take @css instead of @cgroup. The conversions are mostly straight-forward. A few noteworthy changes are * ->css_alloc() now takes css of the parent cgroup rather than the pointer to the new cgroup as the css for the new cgroup doesn't exist yet. Knowing the parent css is enough for all the existing subsystems. * In kernel/cgroup.c::offline_css(), unnecessary open coded css dereference is replaced with local variable access. This patch shouldn't cause any behavior differences. v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced with local variable @css as suggested by Li Zefan. Rebased on top of new for-3.12 which includes for-3.11-fixes so that ->css_free() invocation added by da0a12caff ("cgroup: fix a leak when percpu_ref_init() fails") is converted too. Suggested by Li Zefan. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org>
* cgroup: add css_parent()Tejun Heo2013-08-081-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, controllers have to explicitly follow the cgroup hierarchy to find the parent of a given css. cgroup is moving towards using cgroup_subsys_state as the main controller interface construct, so let's provide a way to climb the hierarchy using just csses. This patch implements css_parent() which, given a css, returns its parent. The function is guarnateed to valid non-NULL parent css as long as the target css is not at the top of the hierarchy. freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices are converted to use css_parent() instead of accessing cgroup->parent directly. * __parent_ca() is dropped from cpuacct and its usage is replaced with parent_ca(). The only difference between the two was NULL test on cgroup->parent which is now embedded in css_parent() making the distinction moot. Note that eventually a css->parent field will be added to css and the NULL check in css_parent() will go away. This patch shouldn't cause any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: add/update accessors which obtain subsys specific data from cssTejun Heo2013-08-081-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | css (cgroup_subsys_state) is usually embedded in a subsys specific data structure. Subsystems either use container_of() directly to cast from css to such data structure or has an accessor function wrapping such cast. As cgroup as whole is moving towards using css as the main interface handle, add and update such accessors to ease dealing with css's. All accessors explicitly handle NULL input and return NULL in those cases. While this looks like an extra branch in the code, as all controllers specific data structures have css as the first field, the casting doesn't involve any offsetting and the compiler can trivially optimize out the branch. * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such accessor. Added. * memory, hugetlb and devices already had one but didn't explicitly handle NULL input. Updated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* netprio_cgroup: pass around @css instead of @cgroup and kill struct ↵Tejun Heo2013-08-081-28/+28
| | | | | | | | | | | | | | | | | | | | cgroup_netprio_state cgroup controller API will be converted to primarily use struct cgroup_subsys_state instead of struct cgroup. In preparation, make the internal functions of netprio_cgroup pass around @css instead of @cgrp. While at it, kill struct cgroup_netprio_state which only contained struct cgroup_subsys_state without serving any purpose. All functions are converted to deal with @css directly. This patch shouldn't cause any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: David S. Miller <davem@davemloft.net>
* cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/Tejun Heo2013-08-082-3/+3
| | | | | | | | | | | | | | | | | | The names of the two struct cgroup_subsys_state accessors - cgroup_subsys_state() and task_subsys_state() - are somewhat awkward. The former clashes with the type name and the latter doesn't even indicate it's somehow related to cgroup. We're about to revamp large portion of cgroup API, so, let's rename them so that they're less awkward. Most per-controller usages of the accessors are localized in accessor wrappers and given the amount of scheduled changes, this isn't gonna add any noticeable headache. Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state() to task_css(). This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* Merge tag 'nfs-for-3.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2013-07-112-12/+29
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull second set of NFS client updates from Trond Myklebust: "This mainly contains some small readdir optimisations that had dependencies on Al Viro's readdir rewrite. There is also a fix for a nasty deadlock which surfaced earlier in this merge window. Highlights include: - Fix an_rpc pipefs regression that causes a deadlock on mount - Readdir optimisations by Scott Mayhew and Jeff Layton - clean up the rpc_pipefs dentry operation setup" * tag 'nfs-for-3.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Fix a deadlock in rpc_client_register() rpc_pipe: rpc_dir_inode_operations can be static NFS: Allow nfs_updatepage to extend a write under additional circumstances NFS: Make nfs_readdir revalidate less often NFS: Make nfs_attribute_cache_expired() non-static rpc_pipe: set dentry operations at d_alloc time nfs: set verifier on existing dentries in nfs_prime_dcache
| * SUNRPC: Fix a deadlock in rpc_client_register()Trond Myklebust2013-07-101-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 384816051ca9125cd54750e59c780c2a2655fa4f (SUNRPC: fix races on PipeFS MOUNT notifications) introduces a regression when we call rpc_setup_pipedir() with RPCSEC_GSS as the auth flavour. By calling rpcauth_create() while holding the sn->pipefs_sb_lock, we end up deadlocking in gss_pipes_dentries_create_net(). Fix is to register the client and release the mutex before calling rpcauth_create(). Reported-by: Weston Andros Adamson <dros@netapp.com> Tested-by: Weston Andros Adamson <dros@netapp.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: <stable@vger.kernel.org> # : 3848160: SUNRPC: fix races on PipeFS MOUNT Cc: <stable@vger.kernel.org> # : e73f4cc: SUNRPC: split client creation Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * rpc_pipe: rpc_dir_inode_operations can be staticFengguang Wu2013-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hi Jeff, FYI, there are new sparse warnings show up in tree: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git nfs-for-next head: 296afe1f58d55fd56ed85daaafafcfee39f59ece commit: 76fa66657900071016f2bae61de28f059f3f2abf [2/5] rpc_pipe: set dentry operations at d_alloc time >> net/sunrpc/rpc_pipe.c:496:31: sparse: symbol 'rpc_dir_inode_operations' was not declared. Should it be static? Please consider folding the attached diff :-) Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * rpc_pipe: set dentry operations at d_alloc timeJeff Layton2013-07-091-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the way these get set is a little convoluted. If the dentry is allocated via lookup from userland, then it gets set by simple_lookup. If it gets allocated when the kernel is populating the directory, then it gets set via __rpc_lookup_create_exclusive, which has to check whether they might already be set. Between both of these, this ensures that all dentries have their d_op pointer set. Instead of doing that, just have them set at d_alloc time by pointing sb->s_d_op at them. With that change, we no longer want the lookup op to set them, so we must move to using our own lookup routine. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | Merge tag 'for-linus-3.11-merge-window-part-2' of ↵Linus Torvalds2013-07-113-76/+167
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull second round of 9p patches from Eric Van Hensbergen: "Several of these patches were rebased in order to correct style issues. Only stylistic changes were made versus the patches which were in linux-next for two weeks. The rebases have been in linux-next for 3 days and have passed my regressions. The bulk of these are RDMA fixes and improvements. There's also some additions on the extended attributes front to support some additional namespaces and a new option for TCP to force allocation of mount requests from a priviledged port" * tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: Remove the unused variable "err" in v9fs_vfs_getattr() 9P: Add cancelled() to the transport functions. 9P/RDMA: count posted buffers without a pending request 9P/RDMA: Improve error handling in rdma_request 9P/RDMA: Do not free req->rc in error handling in rdma_request() 9P/RDMA: Use a semaphore to protect the RQ 9P/RDMA: Protect against duplicate replies 9P/RDMA: increase P9_RDMA_MAXSIZE to 1MB 9pnet: refactor struct p9_fcall alloc code 9P/RDMA: rdma_request() needs not allocate req->rc 9P: Fix fcall allocation for rdma fs/9p: xattr: add trusted and security namespaces net/9p: add privport option to 9p tcp transport
| * | 9P: Add cancelled() to the transport functions.Simon Derr2013-07-072-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RDMA needs to post a buffer for each incoming reply. Hence it needs to keep count of these and needs to be aware of whether a flushed request has received a reply or not. This patch adds the cancelled() callback to the transport modules. It is called when RFLUSH has been received and that the corresponding request will never receive a reply. Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | 9P/RDMA: count posted buffers without a pending requestSimon Derr2013-07-072-3/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In rdma_request(): If an error occurs between posting the recv and the send, there will be a reply context posted without a pending request. Since there is no way to "un-post" it, we remember it and skip post_recv() for the next request. Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | 9P/RDMA: Improve error handling in rdma_requestSimon Derr2013-07-071-16/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Most importantly: - do not free the recv context (rpl_context) after a successful post_recv() - but do free the send context (c) after a failed send. Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | 9P/RDMA: Do not free req->rc in error handling in rdma_request()Simon Derr2013-07-071-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rdma_request() should never be in charge of freeing rc. When an error occurs: * Either the rc buffer has been recv_post()'ed. then kfree()'ing it certainly is a bad idea. * Or is has not, and in that case req->rc still points to it, hence it needs not be freed. Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | 9P/RDMA: Use a semaphore to protect the RQSimon Derr2013-07-071-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code keeps track of the number of buffers posted in the RQ, and will prevent it from overflowing. But it does so by simply dropping post requests (And leaking memory in the process). When this happens there will actually be too few buffers posted, and soon the 9P server will complain about 'RNR retry counter exceeded' errors. Instead, use a semaphore, and block until the RQ is ready for another buffer to be posted. Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | 9P/RDMA: Protect against duplicate repliesSimon Derr2013-07-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A well-behaved server would not send twice the reply to a request. But if it ever happens... This additional check prevents the kernel from leaking memory and possibly more nasty consequences in that unlikely event. Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | 9P/RDMA: increase P9_RDMA_MAXSIZE to 1MBSimon Derr2013-07-071-3/+1
| | | | | | | | | | | | | | | | | | | | | The current value is too low to get good performance. Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | 9pnet: refactor struct p9_fcall alloc codeSimon Derr2013-07-071-17/+18
| | | | | | | | | | | | | | | Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | 9P/RDMA: rdma_request() needs not allocate req->rcSimon Derr2013-07-071-19/+0
| | | | | | | | | | | | | | | | | | | | | p9_tag_alloc() takes care of that. Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | 9P: Fix fcall allocation for rdmaSimon Derr2013-07-071-16/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code assumes that when a request in the request array does have a tc, it also has a rc. This is normally true, but not always : when using RDMA, req->rc will temporarily be set to NULL after the request has been sent. That is usually OK though, as when the reply arrives, req->rc will be reassigned to a sane value before the request is recycled. But there is a catch : if the request is flushed, the reply will never arrive, and req->rc will be NULL, but not req->tc. This patch fixes p9_tag_alloc to take this into account. Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | net/9p: add privport option to 9p tcp transportJim Garlick2013-07-071-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the privport option is specified, the tcp transport binds local address to a reserved port before connecting to the 9p server. In some cases when 9P AUTH cannot be implemented, this is better than nothing. Signed-off-by: Jim Garlick <garlick@llnl.gov> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* | | Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2013-07-116-51/+63
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd changes from Bruce Fields: "Changes this time include: - 4.1 enabled on the server by default: the last 4.1-specific issues I know of are fixed, so we're not going to find the rest of the bugs without more exposure. - Experimental support for NFSv4.2 MAC Labeling (to allow running selinux over NFS), from Dave Quigley. - Fixes for some delicate cache/upcall races that could cause rare server hangs; thanks to Neil Brown and Bodo Stroesser for extreme debugging persistence. - Fixes for some bugs found at the recent NFS bakeathon, mostly v4 and v4.1-specific, but also a generic bug handling fragmented rpc calls" * 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits) nfsd4: support minorversion 1 by default nfsd4: allow destroy_session over destroyed session svcrpc: fix failures to handle -1 uid's sunrpc: Don't schedule an upcall on a replaced cache entry. net/sunrpc: xpt_auth_cache should be ignored when expired. sunrpc/cache: ensure items removed from cache do not have pending upcalls. sunrpc/cache: use cache_fresh_unlocked consistently and correctly. sunrpc/cache: remove races with queuing an upcall. nfsd4: return delegation immediately if lease fails nfsd4: do not throw away 4.1 lock state on last unlock nfsd4: delegation-based open reclaims should bypass permissions svcrpc: don't error out on small tcp fragment svcrpc: fix handling of too-short rpc's nfsd4: minor read_buf cleanup nfsd4: fix decoding of compounds across page boundaries nfsd4: clean up nfs4_open_delegation NFSD: Don't give out read delegations on creates nfsd4: allow client to send no cb_sec flavors nfsd4: fail attempts to request gss on the backchannel nfsd4: implement minimal SP4_MACH_CRED ...
| * | | svcrpc: fix failures to handle -1 uid'sJ. Bruce Fields2013-07-081-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of f025adf191924e3a75ce80e130afcd2485b53bb8 "sunrpc: Properly decode kuids and kgids in RPC_AUTH_UNIX credentials" any rpc containing a -1 (0xffff) uid or gid would fail with a badcred error. Commit afe3c3fd5392b2f0066930abc5dbd3f4b14a0f13 "svcrpc: fix failures to handle -1 uid's and gid's" fixed part of the problem, but overlooked the gid upcall--the kernel can request supplementary gid's for the -1 uid, but mountd's attempt write a response will get -EINVAL. Symptoms were nfsd failing to reply to the first attempt to use a newly negotiated krb5 context. Reported-by: Sven Geggus <lists@fuchsschwanzdomain.de> Tested-by: Sven Geggus <lists@fuchsschwanzdomain.de> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | sunrpc: Don't schedule an upcall on a replaced cache entry.NeilBrown2013-07-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a cache entry is replaced, the "expiry_time" get set to zero by a call to "cache_fresh_locked(..., 0)" at the end of "sunrpc_cache_update". This low expiry time makes cache_check() think that the 'refresh_age' is negative, so the 'age' is comparatively large and a refresh is triggered. However refreshing a replaced entry it pointless, it cannot achieve anything useful. So teach cache_check to ignore a low refresh_age when expiry_time is zero. Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | net/sunrpc: xpt_auth_cache should be ignored when expired.NeilBrown2013-07-012-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d202cce8963d9268ff355a386e20243e8332b308 sunrpc: never return expired entries in sunrpc_cache_lookup moved the 'entry is expired' test from cache_check to sunrpc_cache_lookup, so that it happened early and some races could safely be ignored. However the ip_map (in svcauth_unix.c) has a separate single-item cache which allows quick lookup without locking. An entry in this case would not be subject to the expiry test and so could be used well after it has expired. This is not normally a big problem because the first time it is used after it is expired an up-call will be scheduled to refresh the entry (if it hasn't been scheduled already) and the old entry will then be invalidated. So on the second attempt to use it after it has expired, ip_map_cached_get will discard it. However that is subtle and not ideal, so replace the "!cache_valid" test with "cache_is_expired". In doing this we drop the test on the "CACHE_VALID" bit. This is unnecessary as the bit is never cleared, and an entry will only be cached if the bit is set. Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | sunrpc/cache: ensure items removed from cache do not have pending upcalls.NeilBrown2013-07-011-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible for a race to set CACHE_PENDING after cache_clean() has removed a cache entry from the cache. If CACHE_PENDING is still set when the entry is finally 'put', the cache_dequeue() will never happen and we can leak memory. So set a new flag 'CACHE_CLEANED' when we remove something from the cache, and don't queue any upcall if it is set. If CACHE_PENDING is set before CACHE_CLEANED, the call that cache_clean() makes to cache_fresh_unlocked() will free memory as needed. If CACHE_PENDING is set after CACHE_CLEANED, the test in sunrpc_cache_pipe_upcall will ensure that the memory is not allocated. Reported-by: <bstroesser@ts.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | sunrpc/cache: use cache_fresh_unlocked consistently and correctly.NeilBrown2013-07-011-13/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cache_fresh_unlocked() is called when a cache entry has been updated and ensures that if there were any pending upcalls, they are cleared. So every time we update a cache entry, we should call this, and this should be the only way that we try to clear pending calls (that sort of uniformity makes code sooo much easier to read). try_to_negate_entry() will (possibly) mark an entry as negative. If it doesn't, it is because the entry already is VALID. So the entry will be valid on exit, so it is appropriate to call cache_fresh_unlocked(). So tidy up try_to_negate_entry() to do that, and remove partial open-coded cache_fresh_unlocked() from the one call-site of try_to_negate_entry(). In the other branch of the 'switch(cache_make_upcall())', we again have a partial open-coded version of cache_fresh_unlocked(). Replace that with a real call. And again in cache_clean(), use a real call to cache_fresh_unlocked(). These call sites might previously have called cache_revisit_request() if CACHE_PENDING wasn't set. This is never necessary because cache_revisit_request() can only do anything if the item is in the cache_defer_hash, However any time that an item is added to the cache_defer_hash (setup_deferral), the code immediately tests CACHE_PENDING, and removes the entry again if it is clear. So all other places we only need to 'cache_revisit_request' if we've just cleared CACHE_PENDING. Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | sunrpc/cache: remove races with queuing an upcall.NeilBrown2013-07-011-11/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently queue an upcall after setting CACHE_PENDING, and dequeue after clearing CACHE_PENDING. So a request should only be present when CACHE_PENDING is set. However we don't combine the test and the enqueue/dequeue in a protected region, so it is possible (if unlikely) for a race to result in a request being queued without CACHE_PENDING set, or a request to be absent despite CACHE_PENDING. So: include a test for CACHE_PENDING inside the regions of enqueue and dequeue where queue_lock is held, and abort the operation if the value is not as expected. Also remove the early 'return' from cache_dequeue() to ensure that it always removes all entries: As there is no locking between setting CACHE_PENDING and calling sunrpc_cache_pipe_upcall it is not inconceivable for some other thread to clear CACHE_PENDING and then someone else to set it and call sunrpc_cache_pipe_upcall, both before the original threads completed the call. With this, it perfectly safe and correct to: - call cache_dequeue() if and only if we have just cleared CACHE_PENDING - call sunrpc_cache_pipe_upcall() (via cache_make_upcall) if and only if we have just set CACHE_PENDING. Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | svcrpc: don't error out on small tcp fragmentJ. Bruce Fields2013-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Though clients we care about mostly don't do this, it is possible for rpc requests to be sent in multiple fragments. Here we have a sanity check to ensure that the final received rpc isn't too small--except that the number we're actually checking is the length of just the final fragment, not of the whole rpc. So a perfectly legal rpc that's unluckily fragmented could cause the server to close the connection here. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | svcrpc: fix handling of too-short rpc'sJ. Bruce Fields2013-07-011-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we detect that an rpc is too short, we abort and close the connection. Except, there's a bug here: we're leaving sk_datalen nonzero without leaving any pages in the sk_pages array. The most likely result of the inconsistency is a subsequent crash in svc_tcp_clear_pages. Also demote the BUG_ON in svc_tcp_clear_pages to a WARN. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | svcrpc: store gss mech in svc_credJ. Bruce Fields2013-07-012-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Store a pointer to the gss mechanism used in the rq_cred and cl_cred. This will make it easier to enforce SP4_MACH_CRED, which needs to compare the mechanism used on the exchange_id with that used on protected operations. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | svcrpc: introduce init_svc_credJ. Bruce Fields2013-07-011-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | Common helper to zero out fields of the svc_cred. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | Merge branch 'for-3.10' into 'for-3.11'J. Bruce Fields2013-07-012-8/+12
| |\ \ \ | | | | | | | | | | | | | | | Merge bugfixes into my for-3.11 branch.
| * | | | sunrpc: the cache_detail in cache_is_valid is unused any morechaoting fan2013-05-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cache_detail(*detail) in function cache_is_valid is not used any more. Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | sunrpc: server back channel needs no rpcbind methodJ. Bruce Fields2013-05-151-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XPRT_BOUND is set on server backchannel xprts by xs_setup_bc_tcp() (using xprt_set_bound()), and is never cleared, so ->rpcbind() will never need to be called. Reported-by: "Myklebust, Trond" <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2013-07-09331-7956/+11095
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: "This is a re-do of the net-next pull request for the current merge window. The only difference from the one I made the other day is that this has Eliezer's interface renames and the timeout handling changes made based upon your feedback, as well as a few bug fixes that have trickeled in. Highlights: 1) Low latency device polling, eliminating the cost of interrupt handling and context switches. Allows direct polling of a network device from socket operations, such as recvmsg() and poll(). Currently ixgbe, mlx4, and bnx2x support this feature. Full high level description, performance numbers, and design in commit 0a4db187a999 ("Merge branch 'll_poll'") From Eliezer Tamir. 2) With the routing cache removed, ip_check_mc_rcu() gets exercised more than ever before in the case where we have lots of multicast addresses. Use a hash table instead of a simple linked list, from Eric Dumazet. 3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski, Marek Puzyniak, Michal Kazior, and Sujith Manoharan. 4) Support reporting the TUN device persist flag to userspace, from Pavel Emelyanov. 5) Allow controlling network device VF link state using netlink, from Rony Efraim. 6) Support GRE tunneling in openvswitch, from Pravin B Shelar. 7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from Daniel Borkmann and Eric Dumazet. 8) Allow controlling of TCP quickack behavior on a per-route basis, from Cong Wang. 9) Several bug fixes and improvements to vxlan from Stephen Hemminger, Pravin B Shelar, and Mike Rapoport. In particular, support receiving on multiple UDP ports. 10) Major cleanups, particular in the area of debugging and cookie lifetime handline, to the SCTP protocol code. From Daniel Borkmann. 11) Allow packets to cross network namespaces when traversing tunnel devices. From Nicolas Dichtel. 12) Allow monitoring netlink traffic via AF_PACKET sockets, in a manner akin to how we monitor real network traffic via ptype_all. From Daniel Borkmann. 13) Several bug fixes and improvements for the new alx device driver, from Johannes Berg. 14) Fix scalability issues in the netem packet scheduler's time queue, by using an rbtree. From Eric Dumazet. 15) Several bug fixes in TCP loss recovery handling, from Yuchung Cheng. 16) Add support for GSO segmentation of MPLS packets, from Simon Horman. 17) Make network notifiers have a real data type for the opaque pointer that's passed into them. Use this to properly handle network device flag changes in arp_netdev_event(). From Jiri Pirko and Timo Teräs. 18) Convert several drivers over to module_pci_driver(), from Peter Huewe. 19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a O(1) calculation instead. From Eric Dumazet. 20) Support setting of explicit tunnel peer addresses in ipv6, just like ipv4. From Nicolas Dichtel. 21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet. 22) Prevent a single high rate flow from overruning an individual cpu during RX packet processing via selective flow shedding. From Willem de Bruijn. 23) Don't use spinlocks in TCP md5 signing fast paths, from Eric Dumazet. 24) Don't just drop GSO packets which are above the TBF scheduler's burst limit, chop them up so they are in-bounds instead. Also from Eric Dumazet. 25) VLAN offloads are missed when configured on top of a bridge, fix from Vlad Yasevich. 26) Support IPV6 in ping sockets. From Lorenzo Colitti. 27) Receive flow steering targets should be updated at poll() time too, from David Majnemer. 28) Fix several corner case regressions in PMTU/redirect handling due to the routing cache removal, from Timo Teräs. 29) We have to be mindful of ipv4 mapped ipv6 sockets in upd_v6_push_pending_frames(). From Hannes Frederic Sowa. 30) Fix L2TP sequence number handling bugs, from James Chapman." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits) drivers/net: caif: fix wrong rtnl_is_locked() usage drivers/net: enic: release rtnl_lock on error-path vhost-net: fix use-after-free in vhost_net_flush net: mv643xx_eth: do not use port number as platform device id net: sctp: confirm route during forward progress virtio_net: fix race in RX VQ processing virtio: support unlocked queue poll net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit Documentation: Fix references to defunct linux-net@vger.kernel.org net/fs: change busy poll time accounting net: rename low latency sockets functions to busy poll bridge: fix some kernel warning in multicast timer sfc: Fix memory leak when discarding scattered packets sit: fix tunnel update via netlink dt:net:stmmac: Add dt specific phy reset callback support. dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 dt:net:stmmac: Allocate platform data only if its NULL. net:stmmac: fix memleak in the open method ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available net: ipv6: fix wrong ping_v6_sendmsg return value ...
| * | | | net: sctp: confirm route during forward progressDaniel Borkmann2013-07-092-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix has been proposed originally by Vlad Yasevich. He says: When SCTP makes forward progress (receives a SACK that acks new chunks, renegs, or answeres 0-window probes) or when HB-ACK arrives, mark the route as confirmed so we don't unnecessarily send NUD probes. Having a simple SCTP client/server that exchange data chunks every 1sec, without this patch ARP requests are sent periodically every 40-60sec. With this fix applied, an ARP request is only done once right at the "session" beginning. Also, when clearing the related ARP cache entry manually during the session, a new request is correctly done. I have only "backported" this to net-next and tested that it works, so full credit goes to Vlad. Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: rename low latency sockets functions to busy pollEliezer Tamir2013-07-083-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename functions in include/net/ll_poll.h to busy wait. Clarify documentation about expected power use increase. Rename POLL_LL to POLL_BUSY_LOOP. Add need_resched() testing to poll/select busy loops. Note, that in select and poll can_busy_poll is dynamic and is updated continuously to reflect the existence of supported sockets with valid queue information. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | bridge: fix some kernel warning in multicast timerCong Wang2013-07-062-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several people reported the warning: "kernel BUG at kernel/timer.c:729!" and the stack trace is: #7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905 #8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge] #9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge] #10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge] #11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge] #12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc #13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6 #14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad #15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17 #16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68 #17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101 #18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8 #19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun] #20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun] #21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1 #22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe #23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f #24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1 #25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292 this is due to I forgot to check if mp->timer is armed in br_multicast_del_pg(). This bug is introduced by commit 9f00b2e7cf241fa389733d41b6 (bridge: only expire the mdb entry when query is received). Same for __br_mdb_del(). Tested-by: poma <pomidorabelisima@gmail.com> Reported-by: LiYonghua <809674045@qq.com> Reported-by: Robert Hancock <hancockrwd@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | sit: fix tunnel update via netlinkNicolas Dichtel2013-07-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The device can stand in another netns, hence we need to do the lookup in netns tunnel->net. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | ipv6: rt6_check_neigh should successfully verify neigh if no NUD information ↵Hannes Frederic Sowa2013-07-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | are available After the removal of rt->n we do not create a neighbour entry at route insertion time (rt6_bind_neighbour is gone). As long as no neighbour is created because of "useful traffic" we skip this routing entry because rt6_check_neigh cannot pick up a valid neighbour (neigh == NULL) and thus returns false. This change was introduced by commit 887c95cc1da53f66a5890fdeab13414613010097 ("ipv6: Complete neighbour entry removal from dst_entry.") To quote RFC4191: "If the host has no information about the router's reachability, then the host assumes the router is reachable." and also: "A host MUST NOT probe a router's reachability in the absence of useful traffic that the host would have sent to the router if it were reachable." So, just assume the router is reachable and let's rt6_probe do the rest. We don't need to create a neighbour on route insertion time. If we don't compile with CONFIG_IPV6_ROUTER_PREF (RFC4191 support) a neighbour is only valid if its nud_state is NUD_VALID. I did not find any references that we should probe the router on route insertion time via the other RFCs. So skip this route in that case. v2: a) use IS_ENABLED instead of #ifdefs (thanks to Sergei Shtylyov) Reported-by: Pierre Emeriaud <petrus.lt@gmail.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: ipv6: fix wrong ping_v6_sendmsg return valueLorenzo Colitti2013-07-031-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ping_v6_sendmsg currently returns 0 on success. It should return the number of bytes written instead. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: ipv6: add missing lock in ping_v6_sendmsgLorenzo Colitti2013-07-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | netem: fix possible NULL deref in netem_dequeue()Eric Dumazet2013-07-031-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit aec0a40a6f7884 ("netem: use rb tree to implement the time queue") added a regression if a child qdisc is attached to netem, as we perform a NULL dereference. Fix this by adding a temporary variable to cache netem_skb_cb(skb)->time_to_send. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | core: Copy inner_protocol in copy_skb_header()Joe Stringer2013-07-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inner_protocol was added to struct sk_buff in 0d89d2035fe063461a5ddb609b2c12e7fb006e44 ("MPLS: Add limited GSO support"), which is scheduled to be included in v3.11. That patch did not update __copy_skb_header to copy the inner_protocol. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-07-0315-58/+88
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/freescale/fec_main.c drivers/net/ethernet/renesas/sh_eth.c net/ipv4/gre.c The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list) and the splitting of the gre.c code into seperate files. The FEC conflict was two sets of changes adding ethtool support code in an "!CONFIG_M5272" CPP protected block. Finally the sh_eth.c conflict was between one commit add bits set in the .eesr_err_check mask whilst another commit removed the .tx_error_check member and assignments. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: gre: move GSO functions to gre_offloadDaniel Borkmann2013-07-033-103/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similarly to TCP/UDP offloading, move all related GRE functions to gre_offload.c to make things more explicit and similar to the rest of the code. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ip_tunnels: Use skb-len to PMTU check.Pravin B Shelar2013-07-021-44/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In path mtu check, ip header total length works for gre device but not for gre-tap device. Use skb len which is consistent for all tunneling types. This is old bug in gre. This also fixes mtu calculation bug introduced by commit c54419321455631079c7d (GRE: Refactor GRE tunneling code). Reported-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | l2tp: make datapath resilient to packet loss when sequence numbers enabledJames Chapman2013-07-022-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If L2TP data sequence numbers are enabled and reordering is not enabled, data reception stops if a packet is lost since the kernel waits for a sequence number that is never resent. (When reordering is enabled, data reception restarts when the reorder timeout expires.) If no reorder timeout is set, we should count the number of in-sequence packets after the out-of-sequence (OOS) condition is detected, and reset sequence number state after a number of such packets are received. For now, the number of in-sequence packets while in OOS state which cause the sequence number state to be reset is hard-coded to 5. This could be configurable later. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | l2tp: make datapath sequence number support RFC-compliantJames Chapman2013-07-022-5/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The L2TP datapath is not currently RFC-compliant when sequence numbers are used in L2TP data packets. According to the L2TP RFC, any received sequence number NR greater than or equal to the next expected NR is acceptable, where the "greater than or equal to" test is determined by the NR wrap point. This differs for L2TPv2 and L2TPv3, so add state in the session context to hold the max NR value and the NR window size in order to do the acceptable sequence number value check. These might be configurable later, but for now we derive it from the tunnel L2TP version, which determines the sequence number field size. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>