summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ipc,shm: fix shm_file deletion racesGreg Thelen2013-11-211-5/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When IPC_RMID races with other shm operations there's potential for use-after-free of the shm object's associated file (shm_file). Here's the race before this patch: TASK 1 TASK 2 ------ ------ shm_rmid() ipc_lock_object() shmctl() shp = shm_obtain_object_check() shm_destroy() shum_unlock() fput(shp->shm_file) ipc_lock_object() shmem_lock(shp->shm_file) <OOPS> The oops is caused because shm_destroy() calls fput() after dropping the ipc_lock. fput() clears the file's f_inode, f_path.dentry, and f_path.mnt, which causes various NULL pointer references in task 2. I reliably see the oops in task 2 if with shmlock, shmu This patch fixes the races by: 1) set shm_file=NULL in shm_destroy() while holding ipc_object_lock(). 2) modify at risk operations to check shm_file while holding ipc_object_lock(). Example workloads, which each trigger oops... Workload 1: while true; do id=$(shmget 1 4096) shm_rmid $id & shmlock $id & wait done The oops stack shows accessing NULL f_inode due to racing fput: _raw_spin_lock shmem_lock SyS_shmctl Workload 2: while true; do id=$(shmget 1 4096) shmat $id 4096 & shm_rmid $id & wait done The oops stack is similar to workload 1 due to NULL f_inode: touch_atime shmem_mmap shm_mmap mmap_region do_mmap_pgoff do_shmat SyS_shmat Workload 3: while true; do id=$(shmget 1 4096) shmlock $id shm_rmid $id & shmunlock $id & wait done The oops stack shows second fput tripping on an NULL f_inode. The first fput() completed via from shm_destroy(), but a racing thread did a get_file() and queued this fput(): locks_remove_flock __fput ____fput task_work_run do_notify_resume int_signal Fixes: c2c737a0461e ("ipc,shm: shorten critical region for shmat") Fixes: 2caacaa82a51 ("ipc,shm: shorten critical region for shmctl") Signed-off-by: Greg Thelen <gthelen@google.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> # 3.10.17+ 3.11.6+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: thp: give transparent hugepage code a separate copy_pageDave Hansen2013-11-213-38/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, the migration code in migrate_page_copy() uses copy_huge_page() for hugetlbfs and thp pages: if (PageHuge(page) || PageTransHuge(page)) copy_huge_page(newpage, page); So, yay for code reuse. But: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); and a non-hugetlbfs page has no page_hstate(). This works 99% of the time because page_hstate() determines the hstate from the page order alone. Since the page order of a THP page matches the default hugetlbfs page order, it works. But, if you change the default huge page size on the boot command-line (say default_hugepagesz=1G), then we might not even *have* a 2MB hstate so page_hstate() returns null and copy_huge_page() oopses pretty fast since copy_huge_page() dereferences the hstate: void copy_huge_page(struct page *dst, struct page *src) { struct hstate *h = page_hstate(src); if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) { ... Mel noticed that the migration code is really the only user of these functions. This moves all the copy code over to migrate.c and makes copy_huge_page() work for THP by checking for it explicitly. I believe the bug was introduced in commit b32967ff101a ("mm: numa: Add THP migration for the NUMA working set scanning fault case") [akpm@linux-foundation.org: fix coding-style and comment text, per Naoya Horiguchi] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: fix "Use of uninitialized value" warningsJoe Perches2013-11-211-0/+1
| | | | | | | | | | | | | checkpatch is currently confused about some complex macros and references undefined variables $stat and $cond. Make sure these are defined before using them. Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Gerhard Sittig <gsi@denx.de> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* configfs: fix race between dentry put and lookupJunxiao Bi2013-11-211-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A race window in configfs, it starts from one dentry is UNHASHED and end before configfs_d_iput is called. In this window, if a lookup happen, since the original dentry was UNHASHED, so a new dentry will be allocated, and then in configfs_attach_attr(), sd->s_dentry will be updated to the new dentry. Then in configfs_d_iput(), BUG_ON(sd->s_dentry != dentry) will be triggered and system panic. sys_open: sys_close: ... fput dput dentry_kill __d_drop <--- dentry unhashed here, but sd->dentry still point to this dentry. lookup_real configfs_lookup configfs_attach_attr---> update sd->s_dentry to new allocated dentry here. d_kill configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry) triggered here. To fix it, change configfs_d_iput to not update sd->s_dentry if sd->s_count > 2, that means there are another dentry is using the sd beside the one that is going to be put. Use configfs_dirent_lock in configfs_attach_attr to sync with configfs_d_iput. With the following steps, you can reproduce the bug. 1. enable ocfs2, this will mount configfs at /sys/kernel/config and fill configure in it. 2. run the following script. while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done & while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done & Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'next' of ↵Linus Torvalds2013-11-2013-34/+542
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc LE updates from Ben Herrenschmidt: "With my previous pull request I mentioned some remaining Little Endian patches, notably support for our new ABI, which I was sitting on making sure it was all finalized. The toolchain folks confirmed it now, the new ABI is stable and merged with gcc, so we are all good. Oh and we actually missed the actual Kconfig switch for LE so here it is, along with a couple more bug fixes. I have more fixes but not related to LE so I'll send them as a separate pull request tomorrow, let's get this one out of the way. Note that this supports running user space binaries using the new ABI, but the kernel itself still needs to be built with the old one. We'll bring fixes for that after -rc1. Here's Anton log that goes with this series: This patch series adds support for the new ABI, LPAR support for H_SET_MODE and finally adds a kconfig option and defconfig. ABIv2 support was recently committed to binutils and gcc, and should be merged into glibc soon. There are a number of very nice improvements including the removal of function descriptors. Rusty's kernel patches allow binaries of either ABI to work, easing the transition" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc: Wrong DWARF CFI in the kernel vdso for little-endian / ELFv2 powerpc: Add pseries_le_defconfig powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option. powerpc: Don't use ELFv2 ABI to build the kernel powerpc: ELF2 binaries signal handling powerpc: ELF2 binaries launched directly. powerpc: Set eflags correctly for ELF ABIv2 core dumps. powerpc: Add TIF_ELF2ABI flag. pseries: Add H_SET_MODE to change exception endianness powerpc/pseries: Fix endian issues in pseries EEH code
| * powerpc: Wrong DWARF CFI in the kernel vdso for little-endian / ELFv2Ulrich Weigand2013-11-211-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've finally tracked down why my CR signal-unwind test case still fails on little-endian. The problem turned to be that the kernel installs a signal trampoline in the vDSO, and provides a DWARF CFI record for that trampoline. This CFI describes the save location for CR: rsave (70, 38*RSIZE + (RSIZE - CRSIZE)) which is correct for big-endian, but points to the wrong word on little-endian. This is wrong no matter which ABI. In addition, for the ELFv2 ABI, we should not only provide a CFI record for register 70 (cr2), but for all CR fields separately. Strictly speaking, I guess this would mean providing two separate vDSO images, one for ELFv1 processes and one for ELFv2 processes (or maybe playing some tricks with conditional DWARF expressions). However, having CFI records for the other CR fields in ELFv1 is not actually wrong, they just will be ignored. So it seems the simplest fix would be just to always provide CFI for all the fields. Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * powerpc: Add pseries_le_defconfigAnton Blanchard2013-11-211-0/+352
| | | | | | | | | | Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.Anton Blanchard2013-11-211-0/+11
| | | | | | | | | | | | | | | | With the little endian support merged, we can add the CONFIG_CPU_LITTLE_ENDIAN kernel config option. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * powerpc: Don't use ELFv2 ABI to build the kernelAlistair Popple2013-11-211-0/+1
| | | | | | | | | | | | | | | | | | | | The kernel doesn't build correctly using the ELFv2 ABI. This patch ensures that the ELFv1 ABI is used when building a kernel with an ELFv2 enabled compiler. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * powerpc: ELF2 binaries signal handlingRusty Russell2013-11-211-9/+16
| | | | | | | | | | | | | | | | | | | | For the ELFv2 ABI, the hander is the entry point, not a function descriptor. We also need to set up r12, and fortunately the fast_exception_return exit path restores r12 for us so nothing else is required. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * powerpc: ELF2 binaries launched directly.Rusty Russell2013-11-211-15/+35
| | | | | | | | | | | | | | | | | | No function descriptor, but we set r12 up and set TIF_RESTOREALL as it normally isn't restored on return from syscall. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * powerpc: Set eflags correctly for ELF ABIv2 core dumps.Rusty Russell2013-11-211-0/+2
| | | | | | | | | | | | | | | | We leave it at zero (though it could be 1) for old tasks. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * powerpc: Add TIF_ELF2ABI flag.Rusty Russell2013-11-212-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Little endian ppc64 is getting an exciting new ABI. This is reflected by the bottom two bits of e_flags in the ELF header: 0 == legacy binaries (v1 ABI) 1 == binaries using the old ABI (compiled with a new toolchain) 2 == binaries using the new ABI. We store this in a thread flag, because we need to set it in core dumps and for signal delivery. Our chief concern is that it doesn't use function descriptors. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * pseries: Add H_SET_MODE to change exception endiannessAnton Blanchard2013-11-214-0/+87
| | | | | | | | | | | | | | | | | | On little endian builds call H_SET_MODE so exceptions have the correct endianness. We need to reset the endian during kexec so do that in the MMU hashtable clear callback. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * powerpc/pseries: Fix endian issues in pseries EEH codeAnton Blanchard2013-11-211-9/+12
| | | | | | | | | | Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2013-11-2024-388/+777
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha Pull alpha updates from Matt Turner: "It contains a few fixes and some work from Richard to make alpha emulation under QEMU much more usable" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha: alpha: Prevent a NULL ptr dereference in csum_partial_copy. alpha: perf: fix out-of-bounds array access triggered from raw event alpha: Use qemu+cserve provided high-res clock and alarm. alpha: Switch to GENERIC_CLOCKEVENTS alpha: Enable the rpcc clocksource for single processor alpha: Reorganize rtc handling alpha: Primitive support for CPU power down. alpha: Allow HZ to be configured alpha: Notice if we're being run under QEMU alpha: Eliminate compiler warning from memset macro
| * | alpha: Prevent a NULL ptr dereference in csum_partial_copy.Jay Estabrook2013-11-161-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduced by 3ddc5b46a8e90f3c92 ("kernel-wide: fix missing validations on __get/__put/__copy_to/__copy_from_user()"). Also fix some other places which could be problematic in a similar way, although they hadn't been proved so, as far as I can tell. Cc: Michael Cree <mcree@orcon.net.nz> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: perf: fix out-of-bounds array access triggered from raw eventWill Deacon2013-11-161-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vince's perf fuzzer uncovered the following issue on Alpha: Unable to handle kernel paging request at virtual address fffffbfe4e46a0e8 CPU 0 perf_fuzzer(1278): Oops 0 pc = [<fffffc000031fbc0>] ra = [<fffffc000031ff54>] ps = 0007 Not tainted pc is at alpha_perf_event_set_period+0x60/0xf0 ra is at alpha_pmu_enable+0x1a4/0x1c0 v0 = 0000000000000000 t0 = 00000000000fffff t1 = fffffc007b3f5800 t2 = fffffbff275faa94 t3 = ffffffffc9b9bd89 t4 = fffffbfe4e46a098 t5 = 0000000000000020 t6 = fffffbfe4e46a0b8 t7 = fffffc007f4c8000 s0 = 0000000000000000 s1 = fffffc0001b0c018 s2 = fffffc0001b0c020 s3 = fffffc007b3f5800 s4 = 0000000000000001 s5 = ffffffffc9b9bd85 s6 = 0000000000000001 a0 = 0000000000000006 a1 = fffffc007b3f5908 a2 = fffffbfe4e46a098 a3 = 00000005000108c0 a4 = 0000000000000000 a5 = 0000000000000000 t8 = 0000000000000001 t9 = 0000000000000001 t10= 0000000027829f6f t11= 0000000000000020 pv = fffffc000031fb60 at = fffffc0000950900 gp = fffffc0000940900 sp = fffffc007f4cbca8 Disabling lock debugging due to kernel taint Trace: [<fffffc000031ff54>] alpha_pmu_enable+0x1a4/0x1c0 [<fffffc000039f4e8>] perf_pmu_enable+0x48/0x60 [<fffffc00003a0d6c>] __perf_install_in_context+0x15c/0x230 [<fffffc000039d1f0>] remote_function+0x80/0xa0 [<fffffc00003a0c10>] __perf_install_in_context+0x0/0x230 [<fffffc000037b7e4>] smp_call_function_single+0x1b4/0x1d0 [<fffffc000039bb70>] task_function_call+0x60/0x80 [<fffffc00003a0c10>] __perf_install_in_context+0x0/0x230 [<fffffc000039bb44>] task_function_call+0x34/0x80 [<fffffc000039d3fc>] perf_install_in_context+0x9c/0x150 [<fffffc00003a0c10>] __perf_install_in_context+0x0/0x230 [<fffffc00003a5100>] SYSC_perf_event_open+0x360/0xac0 [<fffffc00003110c4>] entSys+0xa4/0xc0 This is due to the raw event encoding being used as an index directly into the ev67_mapping array, rather than being validated against the ev67_pmc_event_type enumeration instead. Unlike other architectures, which allow raw events to propagate into the hardware counters with little interference, the limited number of events on Alpha and the strict event <-> counter relationships mean that raw events actually correspond to the Linux-specific Alpha events, rather than anything defined by the architecture. This patch adds a new callback to alpha_pmu_t for validating the raw event encoding with the Linux event types for the PMU, preventing the out-of-bounds array access. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Michael Cree <mcree@orcon.net.nz> Acked-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * | alpha: Use qemu+cserve provided high-res clock and alarm.Richard Henderson2013-11-164-7/+165
| | | | | | | | | | | | | | | | | | | | | QEMU provides a high-resolution timer and alarm; use this for a clock source and clock event source when available. Signed-off-by: Richard Henderson <rth@twiddle.net>
| * | alpha: Switch to GENERIC_CLOCKEVENTSRichard Henderson2013-11-165-109/+53
| | | | | | | | | | | | | | | | | | | | | | | | This allows us to get rid of some hacky code for SMP. Get rid of some cycle counter hackery that's now handled by generic code via clocksource + clock_event_device objects. Signed-off-by: Richard Henderson <rth@twiddle.net>
| * | alpha: Enable the rpcc clocksource for single processorRichard Henderson2013-11-161-30/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | Don't depend on SMP, just check the number of processors online. This allows a single distribution kernel to use the clocksource when run on a single processor machine. Do depend on whether or not we're using WTINT. Signed-off-by: Richard Henderson <rth@twiddle.net>
| * | alpha: Reorganize rtc handlingRichard Henderson2013-11-1611-223/+337
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Discontinue use of GENERIC_CMOS_UPDATE; rely on the RTC subsystem. The marvel platform requires that the rtc only be touched from the boot cpu. This had been partially implemented with hooks for get/set_rtc_time, but read/update_persistent_clock were not handled. Move the hooks from the machine_vec to a special rtc_class_ops struct. We had read_persistent_clock managing the epoch against which the rtc hw is based, but this didn't apply to get_rtc_time or set_rtc_time. This resulted in incorrect values when hwclock(8) gets involved. Allow the epoch to be set from the kernel command-line, overriding the autodetection, which is doomed to fail in 2020. Further, by implementing the rtc ioctl function, we can expose this epoch to userland. Elide the alarm functions that RTC_DRV_CMOS implements. This was highly questionable on Alpha, since the interrupt is used by the system timer. Signed-off-by: Richard Henderson <rth@twiddle.net>
| * | alpha: Primitive support for CPU power down.Richard Henderson2013-11-165-0/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use WTINT to wait for the next interrupt. Squash the WTINT call if the PALcode doesn't support it (e.g. MILO). No attempt is yet made to skip clock ticks during normal scheduling in order to stay in power down mode longer. Signed-off-by: Richard Henderson <rth@twiddle.net>
| * | alpha: Allow HZ to be configuredRichard Henderson2013-11-163-11/+59
| | | | | | | | | | | | | | | | | | | | | With the 1024Hz default, we spend 50% of QEMU emulation processing timer interrupts. Signed-off-by: Richard Henderson <rth@twiddle.net>
| * | alpha: Notice if we're being run under QEMURichard Henderson2013-11-163-7/+38
| | | | | | | | | | | | | | | | | | | | | | | | When building a generic kernel, do a run-time check on the serial number, like we do for MILO. When building a custom kernel, make this a configure-time check. Signed-off-by: Richard Henderson <rth@twiddle.net>
| * | alpha: Eliminate compiler warning from memset macroRichard Henderson2013-11-164-15/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compiling with GCC 4.8 yields several instances of crypto/vmac.c: In function ‘vmac_final’: crypto/vmac.c:616:9: warning: value computed is not used [-Wunused-value] memset(&mac, 0, sizeof(vmac_t)); ^ arch/alpha/include/asm/string.h:31:25: note: in definition of macro ‘memset’ ? __builtin_memset((s),0,(n)) \ ^ Converting the macro to an inline function eliminates this problem. However, doing only that causes problems with the GCC 3.x series. The inline function cannot be named "memset", as otherwise we wind up with recursion via __builtin_memset. Solve this by adjusting the symbols such that __memset is the inline, and ___memset is the real function. Signed-off-by: Richard Henderson <rth@twiddle.net>
* | | Merge branch 'parisc-3.13' of ↵Linus Torvalds2013-11-205-55/+41
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: - revert an access_ok() patch which broke 32bit userspace on 64bit kernels - avoid a gcc miscompilation in two internal pa_memcpy() functions by not inlining those - do not export the definition of SOCK_NONBLOCK via uapi header (fixes build of audit package) - depending on the fault type we now correctly report either SIGBUS or SIGSEGV - a small fix to not compare a size_t variable for < 0 * 'parisc-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: size_t is unsigned, so comparison size < 0 doesn't make sense. parisc: improve SIGBUS/SIGSEGV error reporting parisc: break out SOCK_NONBLOCK define to own asm header file parisc: do not inline pa_memcpy() internal functions Revert "parisc: implement full version of access_ok()"
| * | | parisc: size_t is unsigned, so comparison size < 0 doesn't make sense.Helge Deller2013-11-201-1/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Helge Deller <deller@gmx.de> CC: Mikulas Patocka <mpatocka@redhat.com>
| * | | parisc: improve SIGBUS/SIGSEGV error reportingHelge Deller2013-11-191-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | This patch fixes most of the Linux Test Project testcases, e.g. fstat05. Signed-off-by: Helge Deller <deller@gmx.de>
| * | | parisc: break out SOCK_NONBLOCK define to own asm header fileHelge Deller2013-11-192-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Break SOCK_NONBLOCK out to its own asm-file as other arches do. This fixes build errors with auditd and probably other packages. Signed-off-by: Helge Deller <deller@gmx.de>
| * | | parisc: do not inline pa_memcpy() internal functionsHelge Deller2013-11-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc (4.8.x) creates wrong code when the pa_memcpy() functions are inlined. Especially in 32bit builds it calculates wrong return values if we encounter a fault during execution of the memcpy. Signed-off-by: Helge Deller <deller@gmx.de>
| * | | Revert "parisc: implement full version of access_ok()"Helge Deller2013-11-191-42/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 63379c135331c724d40a87b98eb62d2122981341. It broke userspace and adding more checking is not needed. Even checking if a syscall would access memory in page zero doesn't makes sense since it may lead to some syscalls returning -EFAULT where we would return other error codes on other platforms. In summary, just drop this change and return to always return 1. Signed-off-by: Helge Deller <deller@gmx.de>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2013-11-2035-129/+101
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32 Pull AVR32 updates from Hans-Christian Egtvedt. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32: avr32: uapi: be sure of "_UAPI" prefix for all guard macros avr32: add kprobe_ctlblk memory struct avr32: fix out-of-range jump in large kernels avr32: setup crt for early panic()
| * | | | avr32: uapi: be sure of "_UAPI" prefix for all guard macrosChen Gang2013-11-2031-102/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For all uapi headers, need use "_UAPI" prefix for its guard macro (which will be stripped by "scripts/headers_installer.sh"). Also remove redundant files (bitsperlong.h, errno.h, fcntl.h, ioctl.h, ioctls.h, ipcbuf.h, kvm_para.h, mman.h, poll.h, resource.h, siginfo.h, statfs.h, and unistd.h) which are already in Kbuild. Also be sure that all "#endif" only have one empty line above, and each file has guard macro. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Hans-Christian Egtvedt <hegtvedt@cisco.com>
| * | | | avr32: add kprobe_ctlblk memory structEirik Aanonsen2013-11-201-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This re-enables kprobes on AVR32 architecture. Signed-off-by: Eirik Aanonsen <eaa@wprmedical.com> Signed-off-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
| * | | | avr32: fix out-of-range jump in large kernelsAndreas Bießmann2013-11-202-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes following error (for big kernels): ---8<--- arch/avr32/boot/u-boot/head.o: In function `no_tag_table': (.init.text+0x44): relocation truncated to fit: R_AVR32_22H_PCREL against symbol `panic' defined in .text.unlikely section in kernel/built-in.o arch/avr32/kernel/built-in.o: In function `bad_return': (.ex.text+0x236): relocation truncated to fit: R_AVR32_22H_PCREL against symbol `panic' defined in .text.unlikely section in kernel/built-in.o --->8--- It comes up when the kernel increases and 'panic()' is too far away to fit in the +/- 2MiB range. Which in turn issues from the 21-bit displacement in 'br{cond4}' mnemonic which is one of the two ways to do jumps (rjmp has just 10-bit displacement and therefore a way smaller range). This fact was stated before in 8d29b7b9f81d6b83d869ff054e6c189d6da73f1f. One solution to solve this is to add a local storage for the symbol address and just load the $pc with that value. Signed-off-by: Andreas Bießmann <andreas@biessmann.de> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: stable@vger.kernel.org
| * | | | avr32: setup crt for early panic()Andreas Bießmann2013-11-202-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before the CRT was (fully) set up in kernel_entry (bss cleared before in _start, but also not before jump to panic() in no_tag_table case). This patch fixes this up to have a fully working CRT when branching to panic() in no_tag_table. Signed-off-by: Andreas Bießmann <andreas@biessmann.de> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: stable@vger.kernel.org
* | | | | Merge tag 'squashfs-updates' of ↵Linus Torvalds2013-11-2020-243/+1145
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next Pull squashfs updates from Phillip Lougher: "These patches optionally improve the multi-threading peformance of Squashfs by adding parallel decompression, and direct decompression into the page cache, eliminating an intermediate buffer (removing memcpy overhead and lock contention)" * tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next: Squashfs: Check stream is not NULL in decompressor_multi.c Squashfs: Directly decompress into the page cache for file data Squashfs: Restructure squashfs_readpage() Squashfs: Generalise paging handling in the decompressors Squashfs: add multi-threaded decompression using percpu variable squashfs: Enhance parallel I/O Squashfs: Refactor decompressor interface and code
| * | | | | Squashfs: Check stream is not NULL in decompressor_multi.cPhillip Lougher2013-11-201-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix static checker complaint that stream is not checked in squashfs_decompressor_destroy(). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * | | | | Squashfs: Directly decompress into the page cache for file dataPhillip Lougher2013-11-205-1/+336
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces an implementation of squashfs_readpage_block() that directly decompresses into the page cache. This uses the previously added page handler abstraction to push down the necessary kmap_atomic/kunmap_atomic operations on the page cache buffers into the decompressors. This enables direct copying into the page cache without using the slow kmap/kunmap calls. The code detects when multiple threads are racing in squashfs_readpage() to decompress the same block, and avoids this regression by falling back to using an intermediate buffer. This patch enhances the performance of Squashfs significantly when multiple processes are accessing the filesystem simultaneously because it not only reduces memcopying, but it more importantly eliminates the lock contention on the intermediate buffer. Using single-thread decompression. dd if=file1 of=/dev/null bs=4096 & dd if=file2 of=/dev/null bs=4096 & dd if=file3 of=/dev/null bs=4096 & dd if=file4 of=/dev/null bs=4096 Before: 629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s After: 629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * | | | | Squashfs: Restructure squashfs_readpage()Phillip Lougher2013-11-204-71/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Restructure squashfs_readpage() splitting it into separate functions for datablocks, fragments and sparse blocks. Move the memcpying (from squashfs cache entry) implementation of squashfs_readpage_block into file_cache.c This allows different implementations to be supported. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * | | | | Squashfs: Generalise paging handling in the decompressorsPhillip Lougher2013-11-2013-67/+163
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Further generalise the decompressors by adding a page handler abstraction. This adds helpers to allow the decompressors to access and process the output buffers in an implementation independant manner. This allows different types of output buffer to be passed to the decompressors, with the implementation specific aspects handled at decompression time, but without the knowledge being held in the decompressor wrapper code. This will allow the decompressors to handle Squashfs cache buffers, and page cache pages. This patch adds the abstraction and an implementation for the caches. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * | | | | Squashfs: add multi-threaded decompression using percpu variablePhillip Lougher2013-11-203-20/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a multi-threaded decompression implementation which uses percpu variables. Using percpu variables has advantages and disadvantages over implementations which do not use percpu variables. Advantages: * the nature of percpu variables ensures decompression is load-balanced across the multiple cores. * simplicity. Disadvantages: it limits decompression to one thread per core. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * | | | | squashfs: Enhance parallel I/OMinchan Kim2013-11-203-1/+221
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now squashfs have used for only one stream buffer for decompression so it hurts parallel read performance so this patch supports multiple decompressor to enhance performance parallel I/O. Four 1G file dd read on KVM machine which has 2 CPU and 4G memory. dd if=test/test1.dat of=/dev/null & dd if=test/test2.dat of=/dev/null & dd if=test/test3.dat of=/dev/null & dd if=test/test4.dat of=/dev/null & old : 1m39s -> new : 9s * From v1 * Change comp_strm with decomp_strm - Phillip * Change/add comments - Phillip Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * | | | | Squashfs: Refactor decompressor interface and codePhillip Lougher2013-11-2011-136/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The decompressor interface and code was written from the point of view of single-threaded operation. In doing so it mixed a lot of single-threaded implementation specific aspects into the decompressor code and elsewhere which makes it difficult to seamlessly support multiple different decompressor implementations. This patch does the following: 1. It removes compressor_options parsing from the decompressor init() function. This allows the decompressor init() function to be dynamically called to instantiate multiple decompressors, without the compressor options needing to be read and parsed each time. 2. It moves threading and all sleeping operations out of the decompressors. In doing so, it makes the decompressors non-blocking wrappers which only deal with interfacing with the decompressor implementation. 3. It splits decompressor.[ch] into decompressor generic functions in decompressor.[ch], and moves the single threaded decompressor implementation into decompressor_single.c. The result of this patch is Squashfs should now be able to support multiple decompressors by adding new decompressor_xxx.c files with specialised implementations of the functions in decompressor_single.c Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
* | | | | | Revert "mm: create a separate slab for page->ptl allocation"Linus Torvalds2013-11-203-17/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit ea1e7ed33708c7a760419ff9ded0a6cb90586a50. Al points out that while the commit *does* actually create a separate slab for the page->ptl allocation, that slab is never actually used, and the code continues to use kmalloc/kfree. Damien Wyart points out that the original patch did have the conversion to use kmem_cache_alloc/free, so it got lost somewhere on its way to me. Revert the half-arsed attempt that didn't do anything. If we really do want the special slab (remember: this is all relevant just for debug builds, so it's not necessarily all that critical) we might as well redo the patch fully. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2013-11-2017-187/+86
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs bits and pieces from Al Viro: "Assorted bits that got missed in the first pull request + fixes for a couple of coredump regressions" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fold try_to_ascend() into the sole remaining caller dcache.c: get rid of pointless macros take read_seqbegin_or_lock() and friends to seqlock.h consolidate simple ->d_delete() instances gfs2: endianness misannotations dump_emit(): use __kernel_write(), not vfs_write() dump_align(): fix the dumb braino
| * | | | | | fold try_to_ascend() into the sole remaining callerAl Viro2013-11-151-31/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There used to be a bunch of tree-walkers in dcache.c, all alike. try_to_ascend() had been introduced to abstract a piece of logics duplicated in all of them. These days all these tree-walkers are implemented via the same iterator (d_walk()), which is the only remaining caller of try_to_ascend(), so let's fold it back... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | dcache.c: get rid of pointless macrosAl Viro2013-11-151-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | D_HASH{MASK,BITS} are used once each, both in the same function (d_hash()). At this point they are actively misguiding - they imply that values are compiler constants, which is no longer true. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | take read_seqbegin_or_lock() and friends to seqlock.hAl Viro2013-11-152-29/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>