summaryrefslogtreecommitdiffstats
path: root/include/linux
Commit message (Collapse)AuthorAgeFilesLines
* net: core: explicitly select a txq before doing l2 forwardingJason Wang2014-01-101-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The will cause several issues: - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan instead of lower device which misses the necessary txq synchronization for lower device such as txq stopping or frozen required by dev watchdog or control path. - dev_hard_start_xmit() was called with NULL txq which bypasses the net device watchdog. - dev_hard_start_xmit() does not check txq everywhere which will lead a crash when tso is disabled for lower device. Fix this by explicitly introducing a new param for .ndo_select_queue() for just selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also extended to accept this parameter and dev_queue_xmit_accel() was used to do l2 forwarding transmission. With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of dev_queue_xmit() to do the transmission. In the future, it was also required for macvtap l2 forwarding support since it provides a necessary synchronization method. Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: e1000-devel@lists.sourceforge.net Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NICWei-Chun Chao2014-01-021-0/+13
| | | | | | | | | | | | | | | | | | | | | | | VM to VM GSO traffic is broken if it goes through VXLAN or GRE tunnel and the physical NIC on the host supports hardware VXLAN/GRE GSO offload (e.g. bnx2x and next-gen mlx4). Two issues - (VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header integrity check fails in udp4_ufo_fragment if inner protocol is TCP. Also gso_segs is calculated incorrectly using skb->len that includes tunnel header. Fix: robust check should only be applied to the inner packet. (VXLAN & GRE) Once GSO header integrity check passes, NULL segs is returned and the original skb is sent to hardware. However the tunnel header is already pulled. Fix: tunnel header needs to be restored so that hardware can perform GSO properly on the original packet. Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: Fix header ops passthru when doing TX VLAN offload.David S. Miller2013-12-311-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When the vlan code detects that the real device can do TX VLAN offloads in hardware, it tries to arrange for the real device's header_ops to be invoked directly. But it does so illegally, by simply hooking the real device's header_ops up to the VLAN device. This doesn't work because we will end up invoking a set of header_ops routines which expect a device type which matches the real device, but will see a VLAN device instead. Fix this by providing a pass-thru set of header_ops which will arrange to pass the proper real device instead. To facilitate this add a dev_rebuild_header(). There are implementations which provide a ->cache and ->create but not a ->rebuild (f.e. PLIP). So we need a helper function just like dev_hard_header() to avoid crashes. Use this helper in the one existing place where the header_ops->rebuild was being invoked, the neighbour code. With lots of help from Florian Westphal. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2013-12-302-0/+14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "Some holiday bug fixes for 3.13... There is still one bug I'd like to get fixed before 3.13-final. The vlan code erroneously assignes the header ops of the underlying real device to the VLAN device above it when the real device can hardware offload VLAN handling. That's completely bogus because header ops are tied to the device type, so they only expect to see a 'dev' argument compatible with their ops. The fix is the have the VLAN code use a special set of header ops that does the pass-thru correctly, by calling the underlying real device's header ops but _also_ passing in the real device instead of the VLAN device. That fix is currently waiting some testing. Anyways, of note here: 1) Fix bitmap edge case in radiotap, from Johannes Berg. 2) Fix oops on driver unload in rtlwifi, from Larry Finger. 3) Bonding doesn't do locking correctly during speed/duplex/link changes, from Ding Tianhong. 4) Fix header parsing in GRE code, this bug has been around for a few releases. From Timo Teräs. 5) SIT tunnel driver MTU check needs to take GSO into account, from Eric Dumazet. 6) Minor info leak in inet_diag, from Daniel Borkmann. 7) Info leak in YAM hamradio driver, from Salva Peiró. 8) Fix route expiration state handling in ipv6 routing code, from Li RongQing. 9) DCCP probe module does not check request_module()'s return value, from Wang Weidong. 10) cpsw driver passes NULL device names to request_irq(), from Mugunthan V N. 11) Prevent a NULL splat in RDS binding code, from Sasha Levin. 12) Fix 4G overflow test in tg3 driver, from Nithin Sujir. 13) Cure use after free in arc_emac and fec driver's software timestamp handling, from Eric Dumazet. 14) SIT driver can fail to release the route when iptunnel_handle_offloads() throws an error. From Li RongQing. 15) Several batman-adv fixes from Simon Wunderlich and Antonio Quartulli. 16) Fix deadlock during TIPC socket release, from Ying Xue. 17) Fix regression in ROSE protocol recvmsg() msg_name handling, from Florian Westphal. 18) stmmac PTP support releases wrong spinlock, from Vince Bridgers" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits) stmmac: Fix incorrect spinlock release and PTP cap detection. phy: IRQ cannot be shared net: rose: restore old recvmsg behavior xen-netback: fix guest-receive-side array sizes fec: Do not assume that PHY reset is active low tipc: fix deadlock during socket release netfilter: nf_tables: fix wrong datatype in nft_validate_data_load() batman-adv: fix vlan header access batman-adv: clean nf state when removing protocol header batman-adv: fix alignment for batadv_tvlv_tt_change batman-adv: fix size of batadv_bla_claim_dst batman-adv: fix size of batadv_icmp_header batman-adv: fix header alignment by unrolling batadv_header batman-adv: fix alignment for batadv_coded_packet netfilter: nf_tables: fix oops when updating table with user chains netfilter: nf_tables: fix dumping with large number of sets ipv6: release dst properly in ipip6_tunnel_xmit netxen: Correct off-by-one errors in bounds checks net: Add some clarification to skb_tx_timestamp() comment. arc_emac: fix potential use after free ...
| * net: Add some clarification to skb_tx_timestamp() comment.David S. Miller2013-12-271-0/+4
| | | | | | | | | | | | | | | | We've seen so many instances of people invoking skb_tx_timestamp() after the device already has been given the packet, that it's worth being a little bit more verbose and explicit in this comment. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of ↵David S. Miller2013-12-191-0/+5
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates This series contains updates to net, ixgbe and e1000e. David provides compiler fixes for e1000e. Don provides a fix for ixgbe to resolve a compile warning. John provides a fix to net where it is useful to be able to walk all upper devices when bringing a device online where the RTNL lock is held. In this case, it is safe to walk the all_adj_list because the RTNL lock is used to protect the write side as well. This patch adds a check to see if the RTNL lock is held before throwing a warning in netdev_all_upper_get_next_dev_rcu(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * net: allow netdev_all_upper_get_next_dev_rcu with rtnl lock heldJohn Fastabend2013-12-171-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is useful to be able to walk all upper devices when bringing a device online where the RTNL lock is held. In this case it is safe to walk the all_adj_list because the RTNL lock is used to protect the write side as well. This patch adds a check to see if the rtnl lock is held before throwing a warning in netdev_all_upper_get_next_dev_rcu(). Also because we now have a call site for lockdep_rtnl_is_held() outside COFIG_LOCK_PROVING an inline definition returning 1 is needed. Similar to the rcu_read_lock_is_held(). Fixes: 2a47fa45d4df ("ixgbe: enable l2 forwarding acceleration for macvlans") CC: Veaceslav Falico <vfalico@redhat.com> Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | ip_gre: fix msg_name parsing for recvfrom/recvmsgTimo Teräs2013-12-181-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ipgre_header_parse() needs to parse the tunnel's ip header and it uses mac_header to locate the iphdr. This got broken when gre tunneling was refactored as mac_header is no longer updated to point to iphdr. Introduce skb_pop_mac_header() helper to do the mac_header assignment and use it in ipgre_rcv() to fix msg_name parsing. Bug introduced in commit c54419321455 (GRE: Refactor GRE tunneling code.) Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'for-3.13-fixes' of ↵Linus Torvalds2013-12-241-0/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu fix from Tejun Heo: "A single commit to fix a spurious sparse warning coming from DEFINE_PER_CPU()'s hack to support the use of weak symbols. Shouldn't cause observable behavior change" * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: fix spurious sparse warnings from DEFINE_PER_CPU()
| * | | percpu: fix spurious sparse warnings from DEFINE_PER_CPU()Tejun Heo2013-12-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_DEBUG_FORCE_WEAK_PER_CPU or CONFIG_ARCH_NEEDS_WEAK_PER_CPU is set, DEFINE_PER_CPU() explodes into cryptic series of definitions to still allow using "static" for percpu variables while keeping all per-cpu symbols unique in the kernel image which is required for weak symbols. This ultimately converts the actual symbol to global whether DEFINE_PER_CPU() is prefixed with static or not. Unfortunately, the macro forgot to add explicit extern declartion of the actual symbol ending up defining global symbol without preceding declaration for static definitions which naturally don't have matching DECLARE_PER_CPU(). The only ill effect is triggering of the following warnings. fs/inode.c:74:8: warning: symbol 'nr_inodes' was not declared. Should it be static? fs/inode.c:75:8: warning: symbol 'nr_unused' was not declared. Should it be static? Fix it by adding extern declaration in the DEFINE_PER_CPU() macro. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Tested-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
* | | | Merge branch 'for-3.13-fixes' of ↵Linus Torvalds2013-12-241-0/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "There's one interseting commit - "libata, freezer: avoid block device removal while system is frozen". It's an ugly hack working around a deadlock condition between driver core resume and block layer device removal paths through freezer which was made more reproducible by writeback being converted to workqueue some releases ago. The bug has nothing to do with libata but it's just an workaround which is easy to backport. After discussion, Rafael and I seem to agree that we don't really need kernel freezables - both kthread and workqueue. There are few specific workqueues which constitute PM operations and require freezing, which will be converted to use workqueue_set_max_active() instead. All other kernel freezer uses are planned to be removed, followed by the removal of kthread and workqueue freezer support, hopefully. Others are device-specific fixes. The most notable is the addition of NO_NCQ_TRIM which is used to disable queued TRIM commands to Micro M500 SSDs which otherwise suffers data corruption" * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: libata, freezer: avoid block device removal while system is frozen libata: implement ATA_HORKAGE_NO_NCQ_TRIM and apply it to Micro M500 SSDs libata: disable a disk via libata.force params ahci: bail out on ICH6 before using AHCI BAR ahci: imx: Explicitly clear IMX6Q_GPR13_SATA_MPLL_CLK_EN libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for Seagate Momentus SpinPoint M8
| * | | | libata: implement ATA_HORKAGE_NO_NCQ_TRIM and apply it to Micro M500 SSDsMarc Carino2013-12-171-0/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Certain drives cannot handle queued TRIM commands properly, even though support is indicated in the IDENTIFY DEVICE buffer. This patch allows for disabling the commands for the affected drives and apply it to the Micron/Crucial M500 SSDs which exhibit incorrect protocol behavior when issued queued TRIM commands, which could lead to silent data corruption. tj: Merged two unnecessarily split patches and made minor edits including shortening horkage name. Signed-off-by: Marc Carino <marc.ceeeee@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/g/1387246554-7311-1-git-send-email-marc.ceeeee@gmail.com Cc: stable@vger.kernel.org # 3.12+
* | | | auxvec.h: account for AT_HWCAP2 in AT_VECTOR_SIZE_BASEArd Biesheuvel2013-12-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2171364d1a92 ("powerpc: Add HWCAP2 aux entry") introduced a new AT_ auxv entry type AT_HWCAP2 but failed to update AT_VECTOR_SIZE_BASE accordingly. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Fixes: 2171364d1a92 (powerpc: Add HWCAP2 aux entry) Cc: stable@vger.kernel.org Acked-by: Michael Neuling <michael@neuling.org> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | aio/migratepages: make aio migrate pages saneBenjamin LaHaise2013-12-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The arbitrary restriction on page counts offered by the core migrate_page_move_mapping() code results in rather suspicious looking fiddling with page reference counts in the aio_migratepage() operation. To fix this, make migrate_page_move_mapping() take an extra_count parameter that allows aio to tell the code about its own reference count on the page being migrated. While cleaning up aio_migratepage(), make it validate that the old page being passed in is actually what aio_migratepage() expects to prevent misbehaviour in the case of races. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
* | | | pstore: Don't allow high traffic options on fragile devicesLuck, Tony2013-12-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some pstore backing devices use on board flash as persistent storage. These have limited numbers of write cycles so it is a poor idea to use them from high frequency operations. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm: do not allocate page->ptl dynamically, if spinlock_t fits to longKirill A. Shutemov2013-12-203-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In struct page we have enough space to fit long-size page->ptl there, but we use dynamically-allocated page->ptl if size(spinlock_t) is larger than sizeof(int). It hurts 64-bit architectures with CONFIG_GENERIC_LOCKBREAK, where sizeof(spinlock_t) == 8, but it easily fits into struct page. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm: numa: guarantee that tlb_flush_pending updates are visible before page ↵Mel Gorman2013-12-181-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | table updates According to documentation on barriers, stores issued before a LOCK can complete after the lock implying that it's possible tlb_flush_pending can be visible after a page table update. As per revised documentation, this patch adds a smp_mb__before_spinlock to guarantee the correct ordering. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm: fix TLB flush race between migration, and change_protection_rangeRik van Riel2013-12-181-0/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a few subtle races, between change_protection_range (used by mprotect and change_prot_numa) on one side, and NUMA page migration and compaction on the other side. The basic race is that there is a time window between when the PTE gets made non-present (PROT_NONE or NUMA), and the TLB is flushed. During that time, a CPU may continue writing to the page. This is fine most of the time, however compaction or the NUMA migration code may come in, and migrate the page away. When that happens, the CPU may continue writing, through the cached translation, to what is no longer the current memory location of the process. This only affects x86, which has a somewhat optimistic pte_accessible. All other architectures appear to be safe, and will either always flush, or flush whenever there is a valid mapping, even with no permissions (SPARC). The basic race looks like this: CPU A CPU B CPU C load TLB entry make entry PTE/PMD_NUMA fault on entry read/write old page start migrating page change PTE/PMD to new page read/write old page [*] flush TLB reload TLB from new entry read/write new page lose data [*] the old page may belong to a new user at this point! The obvious fix is to flush remote TLB entries, by making sure that pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may still be accessible if there is a TLB flush pending for the mm. This should fix both NUMA migration and compaction. [mgorman@suse.de: fix build] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm: numa: avoid unnecessary disruption of NUMA hinting during migrationMel Gorman2013-12-181-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do_huge_pmd_numa_page() handles the case where there is parallel THP migration. However, by the time it is checked the NUMA hinting information has already been disrupted. This patch adds an earlier check with some helpers. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | kexec: migrate to reboot cpuVivek Goyal2013-12-181-0/+1
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel") moved reboot= handling to generic code. In the process it also removed the code in native_machine_shutdown() which are moving reboot process to reboot_cpu/cpu0. I guess that thought must have been that all reboot paths are calling migrate_to_reboot_cpu(), so we don't need this special handling. But kexec reboot path (kernel_kexec()) is not calling migrate_to_reboot_cpu() so above change broke kexec. Now reboot can happen on non-boot cpu and when INIT is sent in second kerneo to bring up BP, it brings down the machine. So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid this problem. Bisected by WANG Chao. Reported-by: Matthew Whitehead <mwhitehe@redhat.com> Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Tested-by: Baoquan He <bhe@redhat.com> Tested-by: WANG Chao <chaowang@redhat.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2013-12-172-3/+32
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Three fixes for scheduler crashes, each triggers in relatively rare, hardware environment dependent situations" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Rework sched_fair time accounting math64: Add mul_u64_u32_shr() sched: Remove PREEMPT_NEED_RESCHED from generic code sched: Initialize power_orig for overlapping groups
| * | sched/fair: Rework sched_fair time accountingPeter Zijlstra2013-12-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christian suffers from a bad BIOS that wrecks his i5's TSC sync. This results in him occasionally seeing time going backwards - which crashes the scheduler ... Most of our time accounting can actually handle that except the most common one; the tick time update of sched_fair. There is a further problem with that code; previously we assumed that because we get a tick every TICK_NSEC our time delta could never exceed 32bits and math was simpler. However, ever since Frederic managed to get NO_HZ_FULL merged; this is no longer the case since now a task can run for a long time indeed without getting a tick. It only takes about ~4.2 seconds to overflow our u32 in nanoseconds. This means we not only need to better deal with time going backwards; but also means we need to be able to deal with large deltas. This patch reworks the entire code and uses mul_u64_u32_shr() as proposed by Andy a long while ago. We express our virtual time scale factor in a u32 multiplier and shift right and the 32bit mul_u64_u32_shr() implementation reduces to a single 32x32->64 multiply if the time delta is still short (common case). For 64bit a 64x64->128 multiply can be used if ARCH_SUPPORTS_INT128. Reported-and-Tested-by: Christian Engelmayer <cengelma@gmx.at> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: fweisbec@gmail.com Cc: Paul Turner <pjt@google.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131118172706.GI3866@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | math64: Add mul_u64_u32_shr()Peter Zijlstra2013-12-111-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce mul_u64_u32_shr() as proposed by Andy a while back; it allows using 64x64->128 muls on 64bit archs and recent GCC which defines __SIZEOF_INT128__ and __int128. (This new method will be used by the scheduler.) Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: fweisbec@gmail.com Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-hxjoeuzmrcaumR0uZwjpe2pv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | sched: Remove PREEMPT_NEED_RESCHED from generic codePeter Zijlstra2013-12-111-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While hunting a preemption issue with Alexander, Ben noticed that the currently generic PREEMPT_NEED_RESCHED stuff is horribly broken for load-store architectures. We currently rely on the IPI to fold TIF_NEED_RESCHED into PREEMPT_NEED_RESCHED, but when this IPI lands while we already have a load for the preempt-count but before the store, the store will erase the PREEMPT_NEED_RESCHED change. The current preempt-count only works on load-store archs because interrupts are assumed to be completely balanced wrt their preempt_count fiddling; the previous preempt_count load will match the preempt_count state after the interrupt and therefore nothing gets lost. This patch removes the PREEMPT_NEED_RESCHED usage from generic code and pushes it into x86 arch code; the generic code goes back to relying on TIF_NEED_RESCHED. Boot tested on x86_64 and compile tested on ppc64. Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-and-Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20131128132641.GP10022@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2013-12-156-23/+24
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Revert CHECKSUM_COMPLETE optimization in pskb_trim_rcsum(), I can't figure out why it breaks things. 2) Fix comparison in netfilter ipset's hash_netnet4_data_equal(), it was basically doing "x == x", from Dave Jones. 3) Freescale FEC driver was DMA mapping the wrong number of bytes, from Sebastian Siewior. 4) Blackhole and prohibit routes in ipv6 were not doing the right thing because their ->input and ->output methods were not being assigned correctly. Now they behave properly like their ipv4 counterparts. From Kamala R. 5) Several drivers advertise the NETIF_F_FRAGLIST capability, but really do not support this feature and will send garbage packets if fed fraglist SKBs. From Eric Dumazet. 6) Fix long standing user triggerable BUG_ON over loopback in RDS protocol stack, from Venkat Venkatsubra. 7) Several not so common code paths can potentially try to invoke packet scheduler actions that might be NULL without checking. Shore things up by either 1) defining a method as mandatory and erroring on registration if that method is NULL 2) defininig a method as optional and the registration function hooks up a default implementation when NULL is seen. From Jamal Hadi Salim. 8) Fix fragment detection in xen-natback driver, from Paul Durrant. 9) Kill dangling enter_memory_pressure method in cg_proto ops, from Eric W Biederman. 10) SKBs that traverse namespaces should have their local_df cleared, from Hannes Frederic Sowa. 11) IOCB file position is not being updated by macvtap_aio_read() and tun_chr_aio_read(). From Zhi Yong Wu. 12) Don't free virtio_net netdev before releasing all of the NAPI instances. From Andrey Vagin. 13) Procfs entry leak in xt_hashlimit, from Sergey Popovich. 14) IPv6 routes that are no cached routes should not count against the garbage collection limits. We had this almost right, but were missing handling addrconf generated routes properly. From Hannes Frederic Sowa. 15) fib{4,6}_rule_suppress() have to consider potentially seeing NULL route info when they are called, from Stefan Tomanek. 16) TUN and MACVTAP have had truncated packet signalling for some time, fix from Jason Wang. 17) Fix use after frrr in __udp4_lib_rcv(), from Eric Dumazet. 18) xen-netback does not interpret the NAPI budget properly for TX work, fix from Paul Durrant. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits) igb: Fix for issue where values could be too high for udelay function. i40e: fix null dereference xen-netback: fix gso_prefix check net: make neigh_priv_len in struct net_device 16bit instead of 8bit drivers: net: cpsw: fix for cpsw crash when build as modules xen-netback: napi: don't prematurely request a tx event xen-netback: napi: fix abuse of budget sch_tbf: use do_div() for 64-bit divide udp: ipv4: must add synchronization in udp_sk_rx_dst_set() net:fec: remove duplicate lines in comment about errata ERR006358 Revert "8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature" 8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature xen-netback: make sure skb linear area covers checksum field net: smc91x: Fix device tree based configuration so it's usable udp: ipv4: fix potential use after free in udp_v4_early_demux() macvtap: signal truncated packets tun: unbreak truncated packet signalling net: sched: htb: fix the calculation of quantum net: sched: tbf: fix the calculation of max_size micrel: add support for KSZ8041RNLI ...
| * | | net: make neigh_priv_len in struct net_device 16bit instead of 8bitSebastian Siewior2013-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | neigh_priv_len is defined as u8. With all debug enabled struct ipoib_neigh has 200 bytes. The largest part is sk_buff_head with 96 bytes and here the spinlock with 72 bytes. The size value still fits in this u8 leaving some room for more. On -RT struct ipoib_neigh put on weight and has 392 bytes. The main reason is sk_buff_head with 288 and the fatty here is spinlock with 192 bytes. This does no longer fit into into neigh_priv_len and gcc complains. This patch changes neigh_priv_len from being 8bit to 16bit. Since the following element (dev_id) is 16bit followed by a spinlock which is aligned, the struct remains with a total size of 3200 (allmodconfig) / 2048 (with as much debug off as possible) bytes on x86-64. On x86-32 the struct is 1856 (allmodconfig) / 1216 (with as much debug off as possible) bytes long. The numbers were gained with and without the patch to prove that this change does not increase the size of the struct. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | micrel: add support for KSZ8041RNLISergei Shtylyov2013-12-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Renesas R-Car development boards use KSZ8041RNLI PHY which for some reason has ID of 0x00221537 that is not documented for KSZ8041-family PHYs and does not match the documented ID of 0x0022151x (where 'x' is the revision). We have to add the new #define PHY_ID_* and new ksphy_driver[] entry, almost the same as KSZ8041 one, differing only in the 'phy_id' and 'name' fields. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: unix: allow set_peek_off to failSasha Levin2013-12-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unix_dgram_recvmsg() will hold the readlock of the socket until recv is complete. In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until unix_dgram_recvmsg() will complete (which can take a while) without allowing us to break out of it, triggering a hung task spew. Instead, allow set_peek_off to fail, this way userspace will not hang. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | xen-netback: fix fragment detection in checksum setupPaul Durrant2013-12-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code to detect fragments in checksum_setup() was missing for IPv4 and too eager for IPv6. (It transpires that Windows seems to send IPv6 packets with a fragment header even if they are not a fragment - i.e. offset is zero, and M bit is not set). This patch also incorporates a fix to callers of maybe_pull_tail() where skb->network_header was being erroneously added to the length argument. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> cc: David Miller <davem@davemloft.net> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Revert "net: Handle CHECKSUM_COMPLETE more adequately in pskb_trim_rcsum()."David S. Miller2013-12-021-21/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 018c5bba052b3a383d83cf0c756da0e7bc748397. It causes regressions for people using chips driven by the sungem driver. Suspicion is that the skb->csum value isn't being adjusted properly. The change also has a bug in that if __pskb_trim() fails, we'll leave a corruped skb->csum value in there. We would really need to revert it to it's original value in that case. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | PCI / tg3: Give up chip reset and carrier loss handling if PCI device is not ↵Rafael J. Wysocki2013-12-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | present Modify tg3_chip_reset() and tg3_close() to check if the PCI network adapter device is accessible at all in order to skip poking it or trying to handle a carrier loss in vain when that's not the case. Introduce a special PCI helper function pci_device_is_present() for this purpose. Of course, this uncovers the lack of the appropriate RTNL locking in tg3_suspend() and tg3_resume(), so add that locking in there too. These changes prevent tg3 from burning a CPU at 100% load level for solid several seconds after the Thunderbolt link is disconnected from a Matrox DS1 docking station. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge branch 'x86/urgent' of ↵Linus Torvalds2013-12-151-2/+0
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is a pretty small batch: The biggest single change is to stop using EFI time services on 32-bit platforms. This matches our current behavior on 64-bit platforms as we already had ruled them out there as being too unreliable. Turns out that affects 32-bit platforms, too. One NULL pointer fix for SGI UV. Two minor build fixes, one of which only affects icc and the other which affects icc and future versions or nonstandard default settings of gcc" * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, efi: Don't use (U)EFI time services on 32 bit x86, build, icc: Remove uninitialized_var() from compiler-intel.h x86/UV: Fix NULL pointer dereference in uv_flush_tlb_others() if the 'nobau' boot option is used x86, build: Pass in additional -mno-mmx, -mno-sse options
| * | | | x86, build, icc: Remove uninitialized_var() from compiler-intel.hH. Peter Anvin2013-12-101-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling with icc, <linux/compiler-gcc.h> ends up included because the icc environment defines __GNUC__. Thus, we neither need nor want to have this macro defined in both compiler-gcc.h and compiler-intel.h, and the fact that they are inconsistent just makes the compiler spew warnings. Reported-by: Sunil K. Pandey <sunil.k.pandey@intel.com> Cc: Kevin B. Smith <kevin.b.smith@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-0mbwou1zt7pafij09b897lg3@git.kernel.org Cc: <stable@vger.kernel.org>
* | | | | Merge tag 'pci-v3.13-fixes-2' of ↵Linus Torvalds2013-12-152-15/+18
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI device hotplug - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael Wysocki) Host bridge drivers - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn Helgaas) - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin (Jason Gunthorpe) Miscellaneous - Avoid unnecessary CPU switch when calling .probe() (Alexander Duyck) - Revert "workqueue: allow work_on_cpu() to be called recursively" (Bjorn Helgaas) - Disable Bus Master only on kexec reboot (Khalid Aziz) - Omit PCI ID macro strings to shorten quirk names for LTO (Michal Marek)" * tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers PCI: Disable Bus Master only on kexec reboot PCI: mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin PCI: Omit PCI ID macro strings to shorten quirk names PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev() Revert "workqueue: allow work_on_cpu() to be called recursively" PCI: Avoid unnecessary CPU switch when calling driver .probe() method
| * | | | | PCI: Disable Bus Master only on kexec rebootKhalid Aziz2013-12-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a flag to tell the PCI subsystem that kernel is shutting down in preparation to kexec a kernel. Add code in PCI subsystem to use this flag to clear Bus Master bit on PCI devices only in case of kexec reboot. This fixes a power-off problem on Acer Aspire V5-573G and likely other machines and avoids any other issues caused by clearing Bus Master bit on PCI devices in normal shutdown path. The problem was introduced by b566a22c2332 ("PCI: disable Bus Master on PCI device shutdown"). This patch is based on discussion at http://marc.info/?l=linux-pci&m=138425645204355&w=2 Link: https://bugzilla.kernel.org/show_bug.cgi?id=63861 Reported-by: Chang Liu <cl91tp@gmail.com> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: stable@vger.kernel.org # v3.5+
| * | | | | PCI: Omit PCI ID macro strings to shorten quirk namesMichal Marek2013-11-251-15/+15
| | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pasting the verbatim PCI_(VENDOR|DEVICE)_* macros in the __pci_fixup_* symbol names results in insanely long names such as __pci_fixup_resumePCI_VENDOR_ID_SERVERWORKSPCI_DEVICE_ID_SERVERWORKS_HT1000SBquirk_disable_broadcom_boot_interrupt When Link-Time Optimization adds its numeric suffix to such symbol, it overflows the namebuf[KSYM_NAME_LEN] array in kernel/kallsyms.c. Use the line number instead to create (nearly) unique symbol names. Reported-by: Joe Mario <jmario@redhat.com> Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Andi Kleen <ak@linux.intel.com>
* | | | | mfd/rtc: s5m: fix register updating by adding regmap for RTCKrzysztof Kozlowski2013-12-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename old regmap field of "struct sec_pmic_dev" to "regmap_pmic" and add new regmap for RTC. On S5M8767A registers were not properly updated and read due to usage of the same regmap as the PMIC. This could be observed in various hangs, e.g. in infinite loop during waiting for UDR field change. On this chip family the RTC has different I2C address than PMIC so additional regmap is needed. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mark Brown <broonie@linaro.org> Acked-by: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | include/linux/kernel.h: make might_fault() a nop for !MMUAxel Lin2013-12-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The machine cannot fault if !MUU, so make might_fault() a nop for !MMU. This fixes below build error if !CONFIG_MMU && (CONFIG_PROVE_LOCKING=y || CONFIG_DEBUG_ATOMIC_SLEEP=y): arch/arm/kernel/built-in.o: In function `arch_ptrace': arch/arm/kernel/ptrace.c:852: undefined reference to `might_fault' arch/arm/kernel/built-in.o: In function `restore_sigframe': arch/arm/kernel/signal.c:173: undefined reference to `might_fault' ... arch/arm/kernel/built-in.o:arch/arm/kernel/signal.c:177: more undefined references to `might_fault' follow make: *** [vmlinux] Error 1 Signed-off-by: Axel Lin <axel.lin@ingics.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | include/linux/hugetlb.h: make isolate_huge_page() an inlineNaoya Horiguchi2013-12-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With CONFIG_HUGETLBFS=n: mm/migrate.c: In function `do_move_page_to_node_array': include/linux/hugetlb.h:140:33: warning: statement with no effect [-Wunused-value] #define isolate_huge_page(p, l) false ^ mm/migrate.c:1170:4: note: in expansion of macro `isolate_huge_page' isolate_huge_page(page, &pagelist); Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | dcache: allow word-at-a-time name hashing with big-endian CPUsWill Deacon2013-12-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When explicitly hashing the end of a string with the word-at-a-time interface, we have to be careful which end of the word we pick up. On big-endian CPUs, the upper-bits will contain the data we're after, so ensure we generate our masks accordingly (and avoid hashing whatever random junk may have been sitting after the string). This patch adds a new dcache helper, bytemask_from_count, which creates a mask appropriate for the CPU endianness. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'keys-devel-20131210' of ↵Linus Torvalds2013-12-122-3/+5
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull misc keyrings fixes from David Howells: "These break down into five sets: - A patch to error handling in the big_key type for huge payloads. If the payload is larger than the "low limit" and the backing store allocation fails, then big_key_instantiate() doesn't clear the payload pointers in the key, assuming them to have been previously cleared - but only one of them is. Unfortunately, the garbage collector still calls big_key_destroy() when sees one of the pointers with a weird value in it (and not NULL) which it then tries to clean up. - Three patches to fix the keyring type: * A patch to fix the hash function to correctly divide keyrings off from keys in the topology of the tree inside the associative array. This is only a problem if searching through nested keyrings - and only if the hash function incorrectly puts the a keyring outside of the 0 branch of the root node. * A patch to fix keyrings' use of the associative array. The __key_link_begin() function initially passes a NULL key pointer to assoc_array_insert() on the basis that it's holding a place in the tree whilst it does more allocation and stuff. This is only a problem when a node contains 16 keys that match at that level and we want to add an also matching 17th. This should easily be manufactured with a keyring full of keyrings (without chucking any other sort of key into the mix) - except for (a) above which makes it on average adding the 65th keyring. * A patch to fix searching down through nested keyrings, where any keyring in the set has more than 16 keyrings and none of the first keyrings we look through has a match (before the tree iteration needs to step to a more distal node). Test in keyutils test suite: http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=8b4ae963ed92523aea18dfbb8cab3f4979e13bd1 - A patch to fix the big_key type's use of a shmem file as its backing store causing audit messages and LSM check failures. This is done by setting S_PRIVATE on the file to avoid LSM checks on the file (access to the shmem file goes through the keyctl() interface and so is gated by the LSM that way). This isn't normally a problem if a key is used by the context that generated it - and it's currently only used by libkrb5. Test in keyutils test suite: http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=d9a53cbab42c293962f2f78f7190253fc73bd32e - A patch to add a generated file to .gitignore. - A patch to fix the alignment of the system certificate data such that it it works on s390. As I understand it, on the S390 arch, symbols must be 2-byte aligned because loading the address discards the least-significant bit" * tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: KEYS: correct alignment of system_certificate_list content in assembly file Ignore generated file kernel/x509_certificate_list security: shmem: implement kernel private shmem inodes KEYS: Fix searching of nested keyrings KEYS: Fix multiple key add into associative array KEYS: Fix the keyring hash function KEYS: Pre-clear struct key on allocation
| * | | | security: shmem: implement kernel private shmem inodesEric Paris2013-12-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a problem where the big_key key storage implementation uses a shmem backed inode to hold the key contents. Because of this detail of implementation LSM checks are being done between processes trying to read the keys and the tmpfs backed inode. The LSM checks are already being handled on the key interface level and should not be enforced at the inode level (since the inode is an implementation detail, not a part of the security model) This patch implements a new function shmem_kernel_file_setup() which returns the equivalent to shmem_file_setup() only the underlying inode has S_PRIVATE set. This means that all LSM checks for the inode in question are skipped. It should only be used for kernel internal operations where the inode is not exposed to userspace without proper LSM checking. It is possible that some other users of shmem_file_setup() should use the new interface, but this has not been explored. Reproducing this bug is a little bit difficult. The steps I used on Fedora are: (1) Turn off selinux enforcing: setenforce 0 (2) Create a huge key k=`dd if=/dev/zero bs=8192 count=1 | keyctl padd big_key test-key @s` (3) Access the key in another context: runcon system_u:system_r:httpd_t:s0-s0:c0.c1023 keyctl print $k >/dev/null (4) Examine the audit logs: ausearch -m AVC -i --subject httpd_t | audit2allow If the last command's output includes a line that looks like: allow httpd_t user_tmpfs_t:file { open read }; There was an inode check between httpd and the tmpfs filesystem. With this patch no such denial will be seen. (NOTE! you should clear your audit log if you have tested for this previously) (Please return you box to enforcing) Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Hugh Dickins <hughd@google.com> cc: linux-mm@kvack.org
| * | | | KEYS: Fix multiple key add into associative arrayDavid Howells2013-12-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If sufficient keys (or keyrings) are added into a keyring such that a node in the associative array's tree overflows (each node has a capacity N, currently 16) and such that all N+1 keys have the same index key segment for that level of the tree (the level'th nibble of the index key), then assoc_array_insert() calls ops->diff_objects() to indicate at which bit position the two index keys vary. However, __key_link_begin() passes a NULL object to assoc_array_insert() with the intention of supplying the correct pointer later before we commit the change. This means that keyring_diff_objects() is given a NULL pointer as one of its arguments which it does not expect. This results in an oops like the attached. With the previous patch to fix the keyring hash function, this can be forced much more easily by creating a keyring and only adding keyrings to it. Add any other sort of key and a different insertion path is taken - all 16+1 objects must want to cluster in the same node slot. This can be tested by: r=`keyctl newring sandbox @s` for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done This should work fine, but oopses when the 17th keyring is added. Since ops->diff_objects() is always called with the first pointer pointing to the object to be inserted (ie. the NULL pointer), we can fix the problem by changing the to-be-inserted object pointer to point to the index key passed into assoc_array_insert() instead. Whilst we're at it, we also switch the arguments so that they are the same as for ->compare_object(). BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 IP: [<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0 ... RIP: 0010:[<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0 ... Call Trace: [<ffffffff81191f9d>] keyring_diff_objects+0x21/0xd2 [<ffffffff811f09ef>] assoc_array_insert+0x3b6/0x908 [<ffffffff811929a7>] __key_link_begin+0x78/0xe5 [<ffffffff81191a2e>] key_create_or_update+0x17d/0x36a [<ffffffff81192e0a>] SyS_add_key+0x123/0x183 [<ffffffff81400ddb>] tracesys+0xdd/0xe2 Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Stephen Gallagher <sgallagh@redhat.com>
* | | | | Merge tag 'pm-3.13-rc3-fixup' of ↵Linus Torvalds2013-12-091-8/+0
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixup from Rafael Wysocki: "This reverts two cpufreq commits that fixed issues for some people, but broke things for others, so revert them and we'll need to fix the original problems differently" * tag 'pm-3.13-rc3-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "cpufreq: fix garbage kobjects on errors during suspend/resume" Revert "cpufreq: suspend governors on system suspend/hibernate"
| * | | | | Revert "cpufreq: suspend governors on system suspend/hibernate"Rafael J. Wysocki2013-12-081-8/+0
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5a87182aa21d (cpufreq: suspend governors on system suspend/hibernate) causes hibernation problems to happen on Bjørn Mork's and Paul Bolle's systems, so revert it. Fixes: 5a87182aa21d (cpufreq: suspend governors on system suspend/hibernate) Reported-by: Bjørn Mork <bjorn@mork.no> Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | Merge tag 'staging-3.13-rc3' of ↵Linus Torvalds2013-12-082-0/+14
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are some bugfixes for the staging and IIO drivers for 3.13-rc3. The resolve the vm memory issue in the tidspbridge driver, fix a much-reported build failure in an ARM driver, and some other IIO bugfixes that have been reported" * tag 'staging-3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: Staging: TIDSPBRIDGE: Use vm_iomap_memory for mmap-ing instead of remap_pfn_range Fix build failure for gp2ap020a00f.c iio: hid-sensors: Fix power and report state HID: hid-sensor-hub: Add logical min and max
| * \ \ \ \ Merge tag 'iio-fixes-for-3.13b' of ↵Greg Kroah-Hartman2013-12-032-0/+14
| |\ \ \ \ \ | | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus Jonathan writes: Second round of IIO fixes for the 3.13 cycle. 2 fixes here. * The gp2ap020a00f is a simple missing kconfig dependency. * The hid sensors hub fix is a work around for an issue introduced by hardware changes due to a certain large software vendor having an 'interesting' interpretation of the specification and hence indexing some arrays from 1 rather than 0. The fix takes advantage of the logical min and max reading facilities introduced by the precursor patch to figure out whether we have a 0 indexed or 1 indexed device and to adjust appropriately. It also drops a previous kconfig option that allowed this issue to be worked around at build time.
| | * | | | iio: hid-sensors: Fix power and report stateSrinivas Pandruvada2013-12-021-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the original HID sensor hub firmwares all Named array enums were to 0-based. But the most recent hub implemented as 1-based, because of the implementation by one of the major OS vendor. Using logical minimum for the field as the base of enum. So we add logical minimum to the selector values before setting those fields. Some sensor hub FWs already changed logical minimum from 0 to 1 to reflect this and hope every other vendor will follow. There is no easy way to add a common HID quirk for NAry elements, even if the standard specifies these field as NAry, the collection used to describe selectors is still just "logical". Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
| | * | | | HID: hid-sensor-hub: Add logical min and maxSrinivas Pandruvada2013-12-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exporting logical minimum and maximum of HID fields as part of the hid sensor attribute info. This can be used for range checking and to calculate enumeration base for NAry fields of HID sensor hub. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
* | | | | | Merge tag 'usb-3.13-rc3' of ↵Linus Torvalds2013-12-082-0/+4
|\ \ \ \ \ \ | |_|_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a bunch of USB fixes for 3.13-rc3. Nothing major, but we seem to have an argument about a XHCI fix, so I'm not including a revert that Sarah requested, because that breaks a USB network driver, and I can't revert the USB network driver fix without reintroducing other bugs that it fixed. So as it is, everything should now be working. Worse case, I can revert the XHCI fix before 3.13-final is out, but it seems to work well here with my testing, so all should be good. Other than that, some driver updates based on reports" * tag 'usb-3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (40 commits) usb: hub: Use correct reset for wedged USB3 devices that are NOTATTACHED usb: ohci-pxa27x: include linux/dma-mapping.h USB: cdc-acm: Added support for the Lenovo RD02-D400 USB Modem usb: tools: fix a regression issue that gcc can't link to pthread USB: switch maintainership of chipidea to Peter USB: pl2303: fixed handling of CS5 setting USB: ftdi_sio: fixed handling of unsupported CSIZE setting USB: mos7840: correct handling of CS5 setting USB: spcp8x5: correct handling of CS5 setting usb: wusbcore: fix deadlock in wusbhc_gtk_rekey usb: wusbcore: do device lookup while holding the hc mutex usb: wusbcore: send keepalives to unauthenticated devices USB: option: support new huawei devices USB: serial: option: blacklist interface 1 for Huawei E173s-6 usb: xhci: Link TRB must not occur within a USB payload burst usb: gadget: f_mass_storage: call try_to_freeze only when its safe usb: gadget: tcm_usb_gadget: mark bot_cleanup_old_alt static usb: gadget: ffs: fix sparse warning usb: gadget: zero: module parameters can be static usb: gadget: storage: fix sparse warning ...