summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* introduce copy_page_to_iter, kill loop over iovec in generic_file_aio_read()Al Viro2014-04-012-1/+2
| | | | | | | | | | | | generic_file_aio_read() was looping over the target iovec, with loop over (source) pages nested inside that. Just set an iov_iter up and pass *that* to do_generic_file_aio_read(). With copy_page_to_iter() doing all work of mapping and copying a page to iovec and advancing iov_iter. Switch shmem_file_aio_read() to the same and kill file_read_actor(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter: Move iov_iter to uio.hKent Overstreet2014-04-012-32/+50
| | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* switch ->is_partially_uptodate() to saner argumentsAl Viro2014-04-012-3/+3
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* pipe: kill ->map() and ->unmap()Al Viro2014-04-011-19/+0
| | | | | | all pipe_buffer_operations have the same instances of those... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new helper: readlink_copy()Al Viro2014-04-011-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of files_defer_init()Al Viro2014-04-011-2/+0
| | | | | | | | the only thing it's doing these days is calculation of upper limit for fs.nr_open sysctl and that can be done statically Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* mark struct file that had write access grabbed by open()Al Viro2014-04-011-0/+2
| | | | | | | | | new flag in ->f_mode - FMODE_WRITER. Set by do_dentry_open() in case when it has grabbed write access, checked by __fput() to decide whether it wants to drop the sucker. Allows to stop bothering with mnt_clone_write() in alloc_file(), along with fewer special_file() checks. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of DEBUG_WRITECOUNTAl Viro2014-04-011-49/+0
| | | | | | it only makes control flow in __fput() and friends more convoluted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch nbd to sockfd_lookup/sockfd_putAl Viro2014-04-011-2/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* smarter propagate_mnt()Al Viro2014-04-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current mainline has copies propagated to *all* nodes, then tears down the copies we made for nodes that do not contain counterparts of the desired mountpoint. That sets the right propagation graph for the copies (at teardown time we move the slaves of removed node to a surviving peer or directly to master), but we end up paying a fairly steep price in useless allocations. It's fairly easy to create a situation where N calls of mount(2) create exactly N bindings, with O(N^2) vfsmounts allocated and freed in process. Fortunately, it is possible to avoid those allocations/freeings. The trick is to create copies in the right order and find which one would've eventually become a master with the current algorithm. It turns out to be possible in O(nodes getting propagation) time and with no extra allocations at all. One part is that we need to make sure that eventual master will be created before its slaves, so we need to walk the propagation tree in a different order - by peer groups. And iterate through the peers before dealing with the next group. Another thing is finding the (earlier) copy that will be a master of one we are about to create; to do that we are (temporary) marking the masters of mountpoints we are attaching the copies to. Either we are in a peer of the last mountpoint we'd dealt with, or we have the following situation: we are attaching to mountpoint M, the last copy S_0 had been attached to M_0 and there are sequences S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i}, S_{i} mounted on M{i} and we need to create a slave of the first S_{k} such that M is getting propagation from M_{k}. It means that the master of M_{k} will be among the sequence of masters of M. On the other hand, the nearest marked node in that sequence will either be the master of M_{k} or the master of M_{k-1} (the latter - in the case if M_{k-1} is a slave of something M gets propagation from, but in a wrong peer group). So we go through the sequence of masters of M until we find a marked one (P). Let N be the one before it. Then we go through the sequence of masters of S_0 until we find one (say, S) mounted on a node D that has P as master and check if D is a peer of N. If it is, S will be the master of new copy, if not - the master of S will be. That's it for the hard part; the rest is fairly simple. Iterator is in next_group(), handling of one prospective mountpoint is propagate_one(). It seems to survive all tests and gives a noticably better performance than the current mainline for setups that are seriously using shared subtrees. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vlan: Warn the user if lowerdev has bad vlan features.Vlad Yasevich2014-03-281-0/+7
| | | | | | | | | Some drivers incorrectly assign vlan acceleration features to vlan_features thus causing issues for Q-in-Q vlan configurations. Warn the user of such cases. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Account for all vlan headers in skb_mac_gso_segmentVlad Yasevich2014-03-281-1/+1
| | | | | | | | | | | | | | | | skb_network_protocol() already accounts for multiple vlan headers that may be present in the skb. However, skb_mac_gso_segment() doesn't know anything about it and assumes that skb->mac_len is set correctly to skip all mac headers. That may not always be the case. If we are simply forwarding the packet (via bridge or macvtap), all vlan headers may not be accounted for. A simple solution is to allow skb_network_protocol to return the vlan depth it has calculated. This way skb_mac_gso_segment will correctly skip all mac headers. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: move DAD and addrconf_verify processing to workqueueHannes Frederic Sowa2014-03-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | addrconf_join_solict and addrconf_join_anycast may cause actions which need rtnl locked, especially on first address creation. A new DAD state is introduced which defers processing of the initial DAD processing into a workqueue. To get rtnl lock we need to push the code paths which depend on those calls up to workqueues, specifically addrconf_verify and the DAD processing. (v2) addrconf_dad_failure needs to be queued up to the workqueue, too. This patch introduces a new DAD state and stop the DAD processing in the workqueue (this is because of the possible ipv6_del_addr processing which removes the solicited multicast address from the device). addrconf_verify_lock is removed, too. After the transition it is not needed any more. As we are not processing in bottom half anymore we need to be a bit more careful about disabling bottom half out when we lock spin_locks which are also used in bh. Relevant backtrace: [ 541.030090] RTNL: assertion failed at net/core/dev.c (4496) [ 541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.10.33-1-amd64-vyatta #1 [ 541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 541.031146] ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8 [ 541.031148] 0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18 [ 541.031150] 0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000 [ 541.031152] Call Trace: [ 541.031153] <IRQ> [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17 [ 541.031180] [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180 [ 541.031183] [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0 [ 541.031185] [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0 [ 541.031189] [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90 [ 541.031198] [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6] [ 541.031208] [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0 [ 541.031212] [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6] [ 541.031216] [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6] [ 541.031219] [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6] [ 541.031223] [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6] [ 541.031226] [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6] [ 541.031229] [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6] [ 541.031233] [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6] [ 541.031241] [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50 [ 541.031244] [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6] [ 541.031247] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6] [ 541.031255] [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90 [ 541.031258] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6] Hunks and backtrace stolen from a patch by Stephen Hemminger. Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errorsZoltan Kiss2014-03-271-2/+2
| | | | | | | | | | | skb_zerocopy can copy elements of the frags array between skbs, but it doesn't orphan them. Also, it doesn't handle errors, so this patch takes care of that as well, and modify the callers accordingly. skb_tx_error() is also added to the callers so they will signal the failed delivery towards the creator of the skb. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* usbnet: include wait queue head in device structureOliver Neukum2014-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | This fixes a race which happens by freeing an object on the stack. Quoting Julius: > The issue is > that it calls usbnet_terminate_urbs() before that, which temporarily > installs a waitqueue in dev->wait in order to be able to wait on the > tasklet to run and finish up some queues. The waiting itself looks > okay, but the access to 'dev->wait' is totally unprotected and can > race arbitrarily. I think in this case usbnet_bh() managed to succeed > it's dev->wait check just before usbnet_terminate_urbs() sets it back > to NULL. The latter then finishes and the waitqueue_t structure on its > stack gets overwritten by other functions halfway through the > wake_up() call in usbnet_bh(). The fix is to just not allocate the data structure on the stack. As dev->wait is abused as a flag it also takes a runtime PM change to fix this bug. Signed-off-by: Oliver Neukum <oneukum@suse.de> Reported-by: Grant Grundler <grundler@google.com> Tested-by: Grant Grundler <grundler@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2014-03-243-8/+14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) OpenVswitch's lookup_datapath() returns error pointers, so don't check against NULL. From Jiri Pirko. 2) pfkey_compile_policy() code path tries to do a GFP_KERNEL allocation under RCU locks, fix by using GFP_ATOMIC when necessary. From Nikolay Aleksandrov. 3) phy_suspend() indirectly passes uninitialized data into the ethtool get wake-on-land implementations. Fix from Sebastian Hesselbarth. 4) CPSW driver unregisters CPTS twice, fix from Benedikt Spranger. 5) If SKB allocation of reply packet fails, vxlan's arp_reduce() defers a NULL pointer. Fix from David Stevens. 6) IPV6 neigh handling in vxlan doesn't validate the destination address properly, and it builds a packet with the src and dst reversed. Fix also from David Stevens. 7) Fix spinlock recursion during subscription failures in TIPC stack, from Erik Hugne. 8) Revert buggy conversion of davinci_emac to devm_request_irq, from Chrstian Riesch. 9) Wrong flags passed into forwarding database netlink notifications, from Nicolas Dichtel. 10) The netpoll neighbour soliciation handler checks wrong ethertype, needs to be ETH_P_IPV6 rather than ETH_P_ARP. Fix from Li RongQing. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits) tipc: fix spinlock recursion bug for failed subscriptions vxlan: fix nonfunctional neigh_reduce() net: davinci_emac: Fix rollback of emac_dev_open() net: davinci_emac: Replace devm_request_irq with request_irq netpoll: fix the skb check in pkt_is_ns net: micrel : ks8851-ml: add vdd-supply support ip6mr: fix mfc notification flags ipmr: fix mfc notification flags rtnetlink: fix fdb notification flags tcp: syncookies: do not use getnstimeofday() netlink: fix setsockopt in mmap examples in documentation openvswitch: Correctly report flow used times for first 5 minutes after boot. via-rhine: Disable device in error path ATHEROS-ATL1E: Convert iounmap to pci_iounmap vxlan: fix potential NULL dereference in arp_reduce() cnic: Update version to 2.5.20 and copyright year. cnic,bnx2i,bnx2fc: Fix inconsistent use of page size cnic: Use proper ulp_ops for per device operations. net: cdc_ncm: fix control message ordering ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly ...
| * tcp: syncookies: do not use getnstimeofday()Eric Dumazet2014-03-201-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | While it is true that getnstimeofday() uses about 40 cycles if TSC is available, it can use 1600 cycles if hpet is the clocksource. Switch to get_jiffies_64(), as this is more than enough, and go back to 60 seconds periods. Fixes: 8c27bd75f04f ("tcp: syncookies: reduce cookie lifetime to 128 seconds") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: cdc_ncm: fix control message orderingBjørn Mork2014-03-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a context modified revert of commit 6a9612e2cb22 ("net: cdc_ncm: remove ncm_parm field") which introduced a NCM specification violation, causing setup errors for some devices. These errors resulted in the device and host disagreeing about shared settings, with complete failure to communicate as the end result. The NCM specification require that many of the NCM specific control reuests are sent only while the NCM Data Interface is in alternate setting 0. Reverting the commit ensures that we follow this requirement. Fixes: 6a9612e2cb22 ("net: cdc_ncm: remove ncm_parm field") Reported-and-tested-by: Pasi Kärkkäinen <pasik@iki.fi> Reported-by: Thomas Schäfer <tschaefer@t-online.de> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of ↵David S. Miller2014-03-181-3/+7
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== 1) Fix a sleep in atomic when pfkey_sadb2xfrm_user_sec_ctx() is called from pfkey_compile_policy(). Fix from Nikolay Aleksandrov. 2) security_xfrm_policy_alloc() can be called in process and atomic context. Add an argument to let the callers choose the appropriate way. Fix from Nikolay Aleksandrov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * selinux: add gfp argument to security_xfrm_policy_alloc and fix callersNikolay Aleksandrov2014-03-101-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | security_xfrm_policy_alloc can be called in atomic context so the allocation should be done with GFP_ATOMIC. Add an argument to let the callers choose the appropriate way. In order to do so a gfp argument needs to be added to the method xfrm_policy_alloc_security in struct security_operations and to the internal function selinux_xfrm_alloc_user. After that switch to GFP_ATOMIC in the atomic callers and leave GFP_KERNEL as before for the rest. The path that needed the gfp argument addition is: security_xfrm_policy_alloc -> security_ops.xfrm_policy_alloc_security -> all users of xfrm_policy_alloc_security (e.g. selinux_xfrm_policy_alloc) -> selinux_xfrm_alloc_user (here the allocation used to be GFP_KERNEL only) Now adding a gfp argument to selinux_xfrm_alloc_user requires us to also add it to security_context_to_sid which is used inside and prior to this patch did only GFP_KERNEL allocation. So add gfp argument to security_context_to_sid and adjust all of its callers as well. CC: Paul Moore <paul@paul-moore.com> CC: Dave Jones <davej@redhat.com> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Fan Du <fan.du@windriver.com> CC: David S. Miller <davem@davemloft.net> CC: LSM list <linux-security-module@vger.kernel.org> CC: SELinux list <selinux@tycho.nsa.gov> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* | | Merge tag 'trace-fixes-v3.14-rc7' of ↵Linus Torvalds2014-03-202-9/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull trace fix from Steven Rostedt: "Vaibhav Nagarnaik discovered that since 3.10 a clean-up patch made the array index in the trace event format bogus. He supplied an elegant solution that uses __stringify() and also removes the need for the event_storage and event_storage_mutex and also cuts off a few K of overhead from the trace events" * tag 'trace-fixes-v3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix array size mismatch in format string
| * | | tracing: Fix array size mismatch in format stringVaibhav Nagarnaik2014-03-202-9/+2
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In event format strings, the array size is reported in two locations. One in array subscript and then via the "size:" attribute. The values reported there have a mismatch. For e.g., in sched:sched_switch the prev_comm and next_comm character arrays have subscript values as [32] where as the actual field size is 16. name: sched_switch ID: 301 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1;signed:0; field:int common_pid; offset:4; size:4; signed:1; field:char prev_comm[32]; offset:8; size:16; signed:1; field:pid_t prev_pid; offset:24; size:4; signed:1; field:int prev_prio; offset:28; size:4; signed:1; field:long prev_state; offset:32; size:8; signed:1; field:char next_comm[32]; offset:40; size:16; signed:1; field:pid_t next_pid; offset:56; size:4; signed:1; field:int next_prio; offset:60; size:4; signed:1; After bisection, the following commit was blamed: 92edca0 tracing: Use direct field, type and system names This commit removes the duplication of strings for field->name and field->type assuming that all the strings passed in __trace_define_field() are immutable. This is not true for arrays, where the type string is created in event_storage variable and field->type for all array fields points to event_storage. Use __stringify() to create a string constant for the type string. Also, get rid of event_storage and event_storage_mutex that are not needed anymore. also, an added benefit is that this reduces the overhead of events a bit more: text data bss dec hex filename 8424787 2036472 1302528 11763787 b3804b vmlinux 8420814 2036408 1302528 11759750 b37086 vmlinux.patched Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com Cc: Laurent Chavey <chavey@google.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* / / mm: fix swapops.h:131 bug if remap_file_pages raced migrationHugh Dickins2014-03-201-2/+1
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add remove_linear_migration_ptes_from_nonlinear(), to fix an interesting little include/linux/swapops.h:131 BUG_ON(!PageLocked) found by trinity: indicating that remove_migration_ptes() failed to find one of the migration entries that was temporarily inserted. The problem comes from remap_file_pages()'s switch from vma_interval_tree (good for inserting the migration entry) to i_mmap_nonlinear list (no good for locating it again); but can only be a problem if the remap_file_pages() range does not cover the whole of the vma (zap_pte() clears the range). remove_migration_ptes() needs a file_nonlinear method to go down the i_mmap_nonlinear list, applying linear location to look for migration entries in those vmas too, just in case there was this race. The file_nonlinear method does need rmap_walk_control.arg to do this; but it never needed vma passed in - vma comes from its own iteration. Reported-and-tested-by: Dave Jones <davej@redhat.com> Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2014-03-131-1/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "I know this is a bit more than you want to see, and I've told the wireless folks under no uncertain terms that they must severely scale back the extent of the fixes they are submitting this late in the game. Anyways: 1) vmxnet3's netpoll doesn't perform the equivalent of an ISR, which is the correct implementation, like it should. Instead it does something like a NAPI poll operation. This leads to crashes. From Neil Horman and Arnd Bergmann. 2) Segmentation of SKBs requires proper socket orphaning of the fragments, otherwise we might access stale state released by the release callbacks. This is a 5 patch fix, but the initial patches are giving variables and such significantly clearer names such that the actual fix itself at the end looks trivial. From Michael S. Tsirkin. 3) TCP control block release can deadlock if invoked from a timer on an already "owned" socket. Fix from Eric Dumazet. 4) In the bridge multicast code, we must validate that the destination address of general queries is the link local all-nodes multicast address. From Linus Lüssing. 5) The x86 BPF JIT support for negative offsets puts the parameter for the helper function call in the wrong register. Fix from Alexei Starovoitov. 6) The descriptor type used for RTL_GIGA_MAC_VER_17 chips in the r8169 driver is incorrect. Fix from Hayes Wang. 7) The xen-netback driver tests skb_shinfo(skb)->gso_type bits to see if a packet is a GSO frame, but that's not the correct test. It should use skb_is_gso(skb) instead. Fix from Wei Liu. 8) Negative msg->msg_namelen values should generate an error, from Matthew Leach. 9) at86rf230 can deadlock because it takes the same lock from it's ISR and it's hard_start_xmit method, without disabling interrupts in the latter. Fix from Alexander Aring. 10) The FEC driver's restart doesn't perform operations in the correct order, so promiscuous settings can get lost. Fix from Stefan Wahren. 11) Fix SKB leak in SCTP cookie handling, from Daniel Borkmann. 12) Reference count and memory leak fixes in TIPC from Ying Xue and Erik Hugne. 13) Forced eviction in inet_frag_evictor() must strictly make sure all frags are deleted, otherwise module unload (f.e. 6lowpan) can crash. Fix from Florian Westphal. 14) Remove assumptions in AF_UNIX's use of csum_partial() (which it uses as a hash function), which breaks on PowerPC. From Anton Blanchard. The main gist of the issue is that csum_partial() is defined only as a value that, once folded (f.e. via csum_fold()) produces a correct 16-bit checksum. It is legitimate, therefore, for csum_partial() to produce two different 32-bit values over the same data if their respective alignments are different. 15) Fix endiannes bug in MAC address handling of ibmveth driver, also from Anton Blanchard. 16) Error checks for ipv6 exthdrs offload registration are reversed, from Anton Nayshtut. 17) Externally triggered ipv6 addrconf routes should count against the garbage collection threshold. Fix from Sabrina Dubroca. 18) The PCI shutdown handler added to the bnx2 driver can wedge the chip if it was not brought up earlier already, which in particular causes the firmware to shut down the PHY. Fix from Michael Chan. 19) Adjust the sanity WARN_ON_ONCE() in qdisc_list_add() because as currently coded it can and does trigger in legitimate situations. From Eric Dumazet. 20) BNA driver fails to build on ARM because of a too large udelay() call, fix from Ben Hutchings. 21) Fair-Queue qdisc holds locks during GFP_KERNEL allocations, fix from Eric Dumazet. 22) The vlan passthrough ops added in the previous release causes a regression in source MAC address setting of outgoing headers in some circumstances. Fix from Peter Boström" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits) ipv6: Avoid unnecessary temporary addresses being generated eth: fec: Fix lost promiscuous mode after reconnecting cable bonding: set correct vlan id for alb xmit path at86rf230: fix lockdep splats net/mlx4_en: Deregister multicast vxlan steering rules when going down vmxnet3: fix building without CONFIG_PCI_MSI MAINTAINERS: add networking selftests to NETWORKING net: socket: error on a negative msg_namelen MAINTAINERS: Add tools/net to NETWORKING [GENERAL] packet: doc: Spelling s/than/that/ net/mlx4_core: Load the IB driver when the device supports IBoE net/mlx4_en: Handle vxlan steering rules for mac address changes net/mlx4_core: Fix wrong dump of the vxlan offloads device capability xen-netback: use skb_is_gso in xenvif_start_xmit r8169: fix the incorrect tx descriptor version tools/net/Makefile: Define PACKAGE to fix build problems x86: bpf_jit: support negative offsets bridge: multicast: enable snooping on general queries only bridge: multicast: add sanity check for general query destination tcp: tcp_release_cb() should release socket ownership ...
| * | tcp: tcp_release_cb() should release socket ownershipEric Dumazet2014-03-111-0/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lars Persson reported following deadlock : -000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock -001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0 -002 |ip_local_deliver_finish(skb = 0x8BD527A0) -003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?) -004 |netif_receive_skb(skb = 0x8BD527A0) -005 |elk_poll(napi = 0x8C770500, budget = 64) -006 |net_rx_action(?) -007 |__do_softirq() -008 |do_softirq() -009 |local_bh_enable() -010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?) -011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0) -012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0) -013 |tcp_release_cb(sk = 0x8BE6B2A0) -014 |release_sock(sk = 0x8BE6B2A0) -015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?) -016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096) -017 |kernel_sendmsg(?, ?, ?, ?, size = 4096) -018 |smb_send_kvec() -019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0) -020 |cifs_call_async() -021 |cifs_async_writev(wdata = 0x87FD6580) -022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88) -023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88) -024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88) -025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88) -026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88) -027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0) -028 |bdi_writeback_workfn(work = 0x87E4A9CC) -029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC) -030 |worker_thread(__worker = 0x8B045880) -031 |kthread(_create = 0x87CADD90) -032 |ret_from_kernel_thread(asm) Bug occurs because __tcp_checksum_complete_user() enables BH, assuming it is running from softirq context. Lars trace involved a NIC without RX checksum support but other points are problematic as well, like the prequeue stuff. Problem is triggered by a timer, that found socket being owned by user. tcp_release_cb() should call tcp_write_timer_handler() or tcp_delack_timer_handler() in the appropriate context : BH disabled and socket lock held, but 'owned' field cleared, as if they were running from timer handlers. Fixes: 6f458dfb4092 ("tcp: improve latencies of timer triggered events") Reported-by: Lars Persson <lars.persson@axis.com> Tested-by: Lars Persson <lars.persson@axis.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Improve SO_TIMESTAMPING documentation and fix a minor code bugAndrew Lutomirski2014-03-061-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The original documentation was very unclear. The code fix is presumably related to the formerly unclear documentation: SOCK_TIMESTAMPING_RX_SOFTWARE has no effect on __sock_recv_timestamp's behavior, so calling __sock_recv_ts_and_drops from sock_recv_ts_and_drops if only SOCK_TIMESTAMPING_RX_SOFTWARE is set is pointless. This should have no user-observable effect. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2014-03-121-0/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Paolo Bonzini: "The ARM patch fixes a build breakage with randconfig. The x86 one fixes Windows guests on AMD processors" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: fix cr8 intercept window ARM: KVM: fix non-VGIC compilation
| * | ARM: KVM: fix non-VGIC compilationMarc Zyngier2014-03-061-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a stub for kvm_vgic_addr when compiling without CONFIG_KVM_ARM_VGIC. The usefulness of this configurarion is extremely doubtful, but let's fix it anyway (until we decide that we'll always support a VGIC). Reported-by: Michele Paolino <m.paolino@virtualopensystems.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2014-03-111-1/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull audit namespace fixes from Eric Biederman: "Starting with 3.14-rc1 the audit code is faulty (think oopses and races) with respect to how it computes the network namespace of which socket to reply to, and I happened to notice by chance when reading through the code. My testing and the automated build bots don't find any problems with these fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: audit: Update kdoc for audit_send_reply and audit_list_rules_send audit: Send replies in the proper network namespace. audit: Use struct net not pid_t to remember the network namespce to reply in
| * | | audit: Send replies in the proper network namespace.Eric W. Biederman2014-02-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In perverse cases of file descriptor passing the current network namespace of a process and the network namespace of a socket used by that socket may differ. Therefore use the network namespace of the appropiate socket to ensure replies always go to the appropiate socket. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* | | | Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds2014-03-103-3/+7
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "Nine fixes" * emailed patches from Andrew Morton akpm@linux-foundation.org>: cris: convert ffs from an object-like macro to a function-like macro hfsplus: add HFSX subfolder count support tools/testing/selftests/ipc/msgque.c: handle msgget failure return correctly MAINTAINERS: blackfin: add git repository revert "kallsyms: fix absolute addresses for kASLR" mm/Kconfig: fix URL for zsmalloc benchmark fs/proc/base.c: fix GPF in /proc/$PID/map_files mm/compaction: break out of loop on !PageBuddy in isolate_freepages_block mm: fix GFP_THISNODE callers and clarify
| * | | | mm: fix GFP_THISNODE callers and clarifyJohannes Weiner2014-03-103-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GFP_THISNODE is for callers that implement their own clever fallback to remote nodes. It restricts the allocation to the specified node and does not invoke reclaim, assuming that the caller will take care of it when the fallback fails, e.g. through a subsequent allocation request without GFP_THISNODE set. However, many current GFP_THISNODE users only want the node exclusive aspect of the flag, without actually implementing their own fallback or triggering reclaim if necessary. This results in things like page migration failing prematurely even when there is easily reclaimable memory available, unless kswapd happens to be running already or a concurrent allocation attempt triggers the necessary reclaim. Convert all callsites that don't implement their own fallback strategy to __GFP_THISNODE. This restricts the allocation a single node too, but at the same time allows the allocator to enter the slowpath, wake kswapd, and invoke direct reclaim if necessary, to make the allocation happen when memory is full. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Rik van Riel <riel@redhat.com> Cc: Jan Stancek <jstancek@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2014-03-102-14/+21
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro. Clean up file table accesses (get rid of fget_light() in favor of the fdget() interface), add proper file position locking. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: get rid of fget_light() sockfd_lookup_light(): switch to fdget^W^Waway from fget_light vfs: atomic f_pos accesses as per POSIX ocfs2 syncs the wrong range...
| * | | | get rid of fget_light()Al Viro2014-03-102-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instead of returning the flags by reference, we can just have the low-level primitive return those in lower bits of unsigned long, with struct file * derived from the rest. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | vfs: atomic f_pos accesses as per POSIXLinus Torvalds2014-03-102-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our write() system call has always been atomic in the sense that you get the expected thread-safe contiguous write, but we haven't actually guaranteed that concurrent writes are serialized wrt f_pos accesses, so threads (or processes) that share a file descriptor and use "write()" concurrently would quite likely overwrite each others data. This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says: "2.9.7 Thread Interactions with Regular File Operations All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links: [...]" and one of the effects is the file position update. This unprotected file position behavior is not new behavior, and nobody has ever cared. Until now. Yongzhi Pan reported unexpected behavior to Michael Kerrisk that was due to this. This resolves the issue with a f_pos-specific lock that is taken by read/write/lseek on file descriptors that may be shared across threads or processes. Reported-by: Yongzhi Pan <panyongzhi@gmail.com> Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | Merge tag 'fixes-for-linus' of ↵Linus Torvalds2014-03-091-0/+4
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from from Olof Johansson: "A collection of fixes for ARM platforms. A little large due to us missing to do one last week, but there's nothing in particular here that is in itself large and scary. Mostly a handful of smaller fixes all over the place. The majority is made up of fixes for OMAP, but there are a few for others as well. In particular, there was a decision to rename a binding for the Broadcom pinctrl block that we need to go in before the final release since we then treat it as ABI" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: omap3-gta04: Add ti,omap36xx to compatible property to avoid problems with booting ARM: tegra: add LED options back into tegra_defconfig ARM: dts: omap3-igep: fix boot fail due wrong compatible match ARM: OMAP3: Fix pinctrl interrupts for core2 pinctrl: Rename Broadcom Capri pinctrl binding pinctrl: refer to updated dt binding string. Update dtsi with new pinctrl compatible string ARM: OMAP: Kill warning in CPUIDLE code with !CONFIG_SMP ARM: OMAP2+: Add support for thumb mode on DT booted N900 ARM: OMAP2+: clock: fix clkoutx2 with CLK_SET_RATE_PARENT ARM: OMAP4: hwmod: Fix SOFTRESET logic for OMAP4 ARM: DRA7: hwmod data: correct the sysc data for spinlock ARM: OMAP5: PRM: Fix reboot handling ARM: sunxi: dt: Change the touchscreen compatibles ARM: sun7i: dt: Fix interrupt trigger types
| * \ \ \ \ Merge tag 'bcm-for-3.14-pinctrl-reduced-rename' of ↵Olof Johansson2014-03-0817-30/+132
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://github.com/broadcom/bcm11351 into fixes Merge 'bcm pinctrl rename' From Christin Daudt: Rename pinctrl dt binding to restore consistency with other bcm mobile bindings. * tag 'bcm-for-3.14-pinctrl-reduced-rename' of git://github.com/broadcom/bcm11351: pinctrl: Rename Broadcom Capri pinctrl binding pinctrl: refer to updated dt binding string. Update dtsi with new pinctrl compatible string + Linux 3.14-rc4 Signed-off-by: Olof Johansson <olof@lixom.net>
| * \ \ \ \ \ Merge tag 'omap-for-v3.14/fixes-rc4' of ↵Arnd Bergmann2014-02-281-0/+4
| |\ \ \ \ \ \ | | |_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Omap fixes from Tony Lindgren: Fixes for omaps mostly to fix the 3430 display regression, and random crashes if booting n900 with device tree and thumb mode. Also few other regressions and fixes. * tag 'omap-for-v3.14/fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP3: Fix pinctrl interrupts for core2 ARM: OMAP: Kill warning in CPUIDLE code with !CONFIG_SMP ARM: OMAP2+: Add support for thumb mode on DT booted N900 ARM: OMAP2+: clock: fix clkoutx2 with CLK_SET_RATE_PARENT ARM: OMAP4: hwmod: Fix SOFTRESET logic for OMAP4 ARM: DRA7: hwmod data: correct the sysc data for spinlock ARM: OMAP5: PRM: Fix reboot handling Signed-off-by: Arnd Bergmann <arnd@arndb.de>
| | * | | | | ARM: OMAP2+: clock: fix clkoutx2 with CLK_SET_RATE_PARENTTomi Valkeinen2014-02-191-0/+4
| | | |_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If CLK_SET_RATE_PARENT is set for a clkoutx2 clock, calling clk_set_rate() on the clock "skips" the x2 multiplier as there are no set_rate and round_rate functions defined for the clkoutx2. This results in getting double the requested clock rates, breaking the display on omap3430 based devices. This got broken when d0f58bd3bba3877fb1af4664c4e33273d36f00e4 and related patches were merged for v3.14, as omapdss driver now relies more on the clk-framework and CLK_SET_RATE_PARENT. This patch implements set_rate and round_rate for clkoutx2. Tested on OMAP3430, OMAP3630, OMAP4460. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Paul Walmsley <paul@pwsan.com>
* | | | | | Merge tag 'nfs-for-3.14-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2014-03-092-2/+7
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - Fix another nfs4_sequence corruptor in RELEASE_LOCKOWNER - Fix an Oopsable delegation callback race - Fix another bad stateid infinite loop - Fail the data server I/O is the stateid represents a lost lock - Fix an Oopsable sunrpc trace event" * tag 'nfs-for-3.14-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Fix oops when trace sunrpc_task events in nfs client NFSv4: Fail the truncate() if the lock/open stateid is invalid NFSv4.1 Fail data server I/O if stateid represents a lost lock NFSv4: Fix the return value of nfs4_select_rw_stateid NFSv4: nfs4_stateid_is_current should return 'true' for an invalid stateid NFS: Fix a delegation callback race NFSv4: Fix another nfs4_sequence corruptor
| * | | | | | SUNRPC: Fix oops when trace sunrpc_task events in nfs clientDitang Chen2014-03-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When tracking sunrpc_task events in nfs client, the clnt pointer may be NULL. [ 139.269266] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [ 139.269915] IP: [<ffffffffa026f216>] ftrace_raw_event_rpc_task_running+0x86/0xf0 [sunrpc] [ 139.269915] PGD 1d293067 PUD 1d294067 PMD 0 [ 139.269915] Oops: 0000 [#1] SMP [ 139.269915] Modules linked in: nfsv4 dns_resolver nfs lockd sunrpc fscache sg ppdev e1000 serio_raw pcspkr parport_pc parport i2c_piix4 i2c_core microcode xfs libcrc32c sd_mod sr_mod cdrom ata_generic crc_t10dif crct10dif_common pata_acpi ahci libahci ata_piix libata dm_mirror dm_region_hash dm_log dm_mod [ 139.269915] CPU: 0 PID: 59 Comm: kworker/0:2 Not tainted 3.10.0-84.el7.x86_64 #1 [ 139.269915] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 139.269915] Workqueue: rpciod rpc_async_schedule [sunrpc] [ 139.269915] task: ffff88001b598000 ti: ffff88001b632000 task.ti: ffff88001b632000 [ 139.269915] RIP: 0010:[<ffffffffa026f216>] [<ffffffffa026f216>] ftrace_raw_event_rpc_task_running+0x86/0xf0 [sunrpc] [ 139.269915] RSP: 0018:ffff88001b633d70 EFLAGS: 00010206 [ 139.269915] RAX: ffff88001dfc5338 RBX: ffff88001cc37a00 RCX: ffff88001dfc5334 [ 139.269915] RDX: ffff88001dfc5338 RSI: 0000000000000000 RDI: ffff88001dfc533c [ 139.269915] RBP: ffff88001b633db0 R08: 000000000000002c R09: 000000000000000a [ 139.269915] R10: 0000000000062180 R11: 00000020759fb9dc R12: ffffffffa0292c20 [ 139.269915] R13: ffff88001dfc5334 R14: 0000000000000000 R15: 0000000000000000 [ 139.269915] FS: 0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [ 139.269915] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 139.269915] CR2: 0000000000000004 CR3: 000000001d290000 CR4: 00000000000006f0 [ 139.269915] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 139.269915] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 139.269915] Stack: [ 139.269915] 000000001b633d98 0000000000000246 ffff88001df1dc00 ffff88001cc37a00 [ 139.269915] ffff88001bc35e60 0000000000000000 ffff88001ffa0a48 ffff88001bc35ee0 [ 139.269915] ffff88001b633e08 ffffffffa02704b5 0000000000010000 ffff88001cc37a70 [ 139.269915] Call Trace: [ 139.269915] [<ffffffffa02704b5>] __rpc_execute+0x1d5/0x400 [sunrpc] [ 139.269915] [<ffffffffa0270706>] rpc_async_schedule+0x26/0x30 [sunrpc] [ 139.269915] [<ffffffff8107867b>] process_one_work+0x17b/0x460 [ 139.269915] [<ffffffff8107942b>] worker_thread+0x11b/0x400 [ 139.269915] [<ffffffff81079310>] ? rescuer_thread+0x3e0/0x3e0 [ 139.269915] [<ffffffff8107fc80>] kthread+0xc0/0xd0 [ 139.269915] [<ffffffff8107fbc0>] ? kthread_create_on_node+0x110/0x110 [ 139.269915] [<ffffffff815d122c>] ret_from_fork+0x7c/0xb0 [ 139.269915] [<ffffffff8107fbc0>] ? kthread_create_on_node+0x110/0x110 [ 139.269915] Code: 4c 8b 45 c8 48 8d 7d d0 89 4d c4 41 89 c9 b9 28 00 00 00 e8 9d b4 e9 e0 48 85 c0 49 89 c5 74 a2 48 89 c7 e8 9d 3f e9 e0 48 89 c2 <41> 8b 46 04 48 8b 7d d0 4c 89 e9 4c 89 e6 89 42 0c 0f b7 83 d4 [ 139.269915] RIP [<ffffffffa026f216>] ftrace_raw_event_rpc_task_running+0x86/0xf0 [sunrpc] [ 139.269915] RSP <ffff88001b633d70> [ 139.269915] CR2: 0000000000000004 [ 140.946406] ---[ end trace ba486328b98d7622 ]--- Signed-off-by: Ditang Chen <chendt.fnst@cn.fujitsu.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | NFSv4: Fix another nfs4_sequence corruptorTrond Myklebust2014-03-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfs4_release_lockowner needs to set the rpc_message reply to point to the nfs4_sequence_res in order to avoid another Oopsable situation in nfs41_assign_slot. Fixes: fbd4bfd1d9d21 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER) Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds2014-03-091-0/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI target fixes from Nicholas Bellinger: "This series addresses a number of outstanding issues wrt to active I/O shutdown using iser-target. This includes: - Fix a long standing tpg_state bug where a tpg could be referenced during explicit shutdown (v3.1+ stable) - Use list_del_init for iscsi_cmd->i_conn_node so list_empty checks work as expected (v3.10+ stable) - Fix a isert_conn->state related hung task bug + ensure outstanding I/O completes during session shutdown. (v3.10+ stable) - Fix isert_conn->post_send_buf_count accounting for RDMA READ/WRITEs (v3.10+ stable) - Ignore FRWR completions during active I/O shutdown (v3.12+ stable) - Fix command leakage for interrupt coalescing during active I/O shutdown (v3.13+ stable) Also included is another DIF emulation fix from Sagi specific to v3.14-rc code" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: Target/sbc: Fix sbc_copy_prot for offset scatters iser-target: Fix command leak for tx_desc->comp_llnode_batch iser-target: Ignore completions for FRWRs in isert_cq_tx_work iser-target: Fix post_send_buf_count for RDMA READ/WRITE iscsi/iser-target: Fix isert_conn->state hung shutdown issues iscsi/iser-target: Use list_del_init for ->i_conn_node iscsi-target: Fix iscsit_get_tpg_from_np tpg_state bug
| * | | | | | | iscsi/iser-target: Fix isert_conn->state hung shutdown issuesNicholas Bellinger2014-03-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses a couple of different hug shutdown issues related to wait_event() + isert_conn->state. First, it changes isert_conn->conn_wait + isert_conn->conn_wait_comp_err from waitqueues to completions, and sets ISER_CONN_TERMINATING from within isert_disconnect_work(). Second, it splits isert_free_conn() into isert_wait_conn() that is called earlier in iscsit_close_connection() to ensure that all outstanding commands have completed before continuing. Finally, it breaks isert_cq_comp_err() into seperate TX / RX related code, and adds logic in isert_cq_rx_comp_err() to wait for outstanding commands to complete before setting ISER_CONN_DOWN and calling complete(&isert_conn->conn_wait_comp_err). Acked-by: Sagi Grimberg <sagig@mellanox.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: <stable@vger.kernel.org> #3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* | | | | | | | Merge branch 'for-3.14-fixes' of ↵Linus Torvalds2014-03-081-0/+1
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "This pull request contains a workqueue usage fix for firewire. For quite a long time now, workqueue only treats two work items identical iff both their addresses and callbacks match. This is to avoid introducing false dependency through the work item being recycled while being executed. This changes non-reentrancy guarantee for the users of PREPARE[_DELAYED]_WORK() - if the function changes, reentrancy isn't guaranteed against the previous instance. Firewire depended on such nonreentrancy guarantee. This is fixed by doing the work item multiplexing from firewire proper while keeping the work function unchanged" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: firewire: don't use PREPARE_DELAYED_WORK
| * | | | | | | | firewire: don't use PREPARE_DELAYED_WORKTejun Heo2014-03-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PREPARE_[DELAYED_]WORK() are being phased out. They have few users and a nasty surprise in terms of reentrancy guarantee as workqueue considers work items to be different if they don't have the same work function. firewire core-device and sbp2 have been been multiplexing work items with multiple work functions. Introduce fw_device_workfn() and sbp2_lu_workfn() which invoke fw_device->workfn and sbp2_logical_unit->workfn respectively and always use the two functions as the work functions and update the users to set the ->workfn fields instead of overriding work functions using PREPARE_DELAYED_WORK(). This fixes a variety of possible regressions since a2c1c57be8d9 "workqueue: consider work function when searching for busy work items" due to which fw_workqueue lost its required non-reentrancy property. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: linux1394-devel@lists.sourceforge.net Cc: stable@vger.kernel.org # v3.9+ Cc: stable@vger.kernel.org # v3.8.2+ Cc: stable@vger.kernel.org # v3.4.60+ Cc: stable@vger.kernel.org # v3.2.40+
* | | | | | | | | Merge tag 'trace-fixes-v3.14-rc5' of ↵Linus Torvalds2014-03-071-0/+6
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "In the past, I've had lots of reports about trace events not working. Developers would say they put a trace_printk() before and after the trace event but when they enable it (and the trace event said it was enabled) they would see the trace_printks but not the trace event. I was not able to reproduce this, but that's because I wasn't looking at the right location. Recently, another bug came up that showed the issue. If your kernel supports signed modules but allows for non-signed modules to be loaded, then when one is, the kernel will silently set the MODULE_FORCED taint on the module. Although, this taint happens without the need for insmod --force or anything of the kind, it labels the module with that taint anyway. If this tainted module has tracepoints, the tracepoints will be ignored because of the MODULE_FORCED taint. But no error message will be displayed. Worse yet, the event infrastructure will still be created letting users enable the trace event represented by the tracepoint, although that event will never actually be enabled. This is because the tracepoint infrastructure allows for non-existing tracepoints to be enabled for new modules to arrive and have their tracepoints set. Although there are several things wrong with the above, this change only addresses the creation of the trace event files for tracepoints that are not created when a module is loaded and is tainted. This change will print an error message about the module being tainted and not the trace events will not be created, and it does not create the trace event infrastructure" * tag 'trace-fixes-v3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Do not add event files for modules that fail tracepoints
| * | | | | | | | | tracing: Do not add event files for modules that fail tracepointsSteven Rostedt (Red Hat)2014-03-031-0/+6
| | |_|_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a module fails to add its tracepoints due to module tainting, do not create the module event infrastructure in the debugfs directory. As the events will not work and worse yet, they will silently fail, making the user wonder why the events they enable do not display anything. Having a warning on module load and the events not visible to the users will make the cause of the problem much clearer. Link: http://lkml.kernel.org/r/20140227154923.265882695@goodmis.org Fixes: 6d723736e472 "tracing/events: add support for modules to TRACE_EVENT" Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: stable@vger.kernel.org # 2.6.31+ Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | | | | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2014-03-071-3/+8
|\ \ \ \ \ \ \ \ \ | |_|_|_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "Small collection of fixes for 3.14-rc. It contains: - Three minor update to blk-mq from Christoph. - Reduce number of unaligned (< 4kb) in-flight writes on mtip32xx to two. From Micron. - Make the blk-mq CPU notify spinlock raw, since it can't be a sleeper spinlock on RT. From Mike Galbraith. - Drop now bogus BUG_ON() for bio iteration with blk integrity. From Nic Bellinger. - Properly propagate the SYNC flag on requests. From Shaohua" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: add REQ_SYNC early rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock bio-integrity: Drop bio_integrity_verify BUG_ON in post bip->bip_iter world blk-mq: support partial I/O completions blk-mq: merge blk_mq_insert_request and blk_mq_run_request blk-mq: remove blk_mq_alloc_rq mtip32xx: Reduce the number of unaligned writes to 2
| * | | | | | | | blk-mq: support partial I/O completionsChristoph Hellwig2014-02-211-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new blk_mq_end_io_partial function to partially complete requests as needed by the SCSI layer. We do this by reusing blk_update_request to advance the bio instead of having a simplified version of it in the blk-mq code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>