summaryrefslogtreecommitdiffstats
path: root/lib
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'akpm' (incoming from Andrew Morton)Linus Torvalds2014-10-291-2/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "21 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits) mm/balloon_compaction: fix deflation when compaction is disabled sh: fix sh770x SCIF memory regions zram: avoid NULL pointer access in concurrent situation mm/slab_common: don't check for duplicate cache names ocfs2: fix d_splice_alias() return code checking mm: rmap: split out page_remove_file_rmap() mm: memcontrol: fix missed end-writeback page accounting mm: page-writeback: inline account_page_dirtied() into single caller lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}() drivers/rtc/rtc-bq32k.c: fix register value memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node() drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock kernel/kmod: fix use-after-free of the sub_info structure drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc mm, thp: fix collapsing of hugepages on madvise drivers: of: add return value to of_reserved_mem_device_init() mm: free compound page with correct order gcov: add ARM64 to GCOV_PROFILE_ALL fsnotify: next_i is freed during fsnotify_unmount_inodes. mm/compaction.c: avoid premature range skip in isolate_migratepages_range ...
| * lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()Jan Kara2014-10-291-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If __bitmap_shift_left() or __bitmap_shift_right() are asked to shift by a multiple of BITS_PER_LONG, they will try to shift a long value by BITS_PER_LONG bits which is undefined. Change the functions to avoid the undefined shift. Coverity id: 1192175 Coverity id: 1192174 Signed-off-by: Jan Kara <jack@suse.cz> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2014-10-291-3/+3
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer fixes from Jens Axboe: "A small collection of fixes for the current kernel. This contains: - Two error handling fixes from Jan Kara. One for null_blk on failure to add a device, and the other for the block/scsi_ioctl SCSI_IOCTL_SEND_COMMAND fixing up the error jump point. - A commit added in the merge window for the bio integrity bits unfortunately disabled merging for all requests if CONFIG_BLK_DEV_INTEGRITY wasn't set. Reverse the logic, so that integrity checking wont disallow merges when not enabled. - A fix from Ming Lei for merging and generating too many segments. This caused a BUG in virtio_blk. - Two error handling printk() fixups from Robert Elliott, improving the information given when we rate limit. - Error handling fixup on elevator_init() failure from Sudip Mukherjee. - A fix from Tony Battersby, fixing up a memory leak in the scatterlist handling with scsi-mq" * 'for-linus' of git://git.kernel.dk/linux-block: block: Fix merge logic when CONFIG_BLK_DEV_INTEGRITY is not defined lib/scatterlist: fix memory leak with scsi-mq block: fix wrong error return in elevator_init() scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND null_blk: Cleanup error recovery in null_add_dev() blk-merge: recaculate segment if it isn't less than max segments fs: clarify rate limit suppressed buffer I/O errors fs: merge I/O error prints into one line
| * lib/scatterlist: fix memory leak with scsi-mqTony Battersby2014-10-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | Fix a memory leak with scsi-mq triggered by commands with large data transfer length. Fixes: c53c6d6a68b1 ("scatterlist: allow chaining to preallocated chunks") Cc: <stable@vger.kernel.org> # 3.17.x Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | Merge tag 'random_for_linus' of ↵Linus Torvalds2014-10-241-0/+16
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull /dev/random updates from Ted Ts'o: "This adds a memzero_explicit() call which is guaranteed not to be optimized away by GCC. This is important when we are wiping cryptographically sensitive material" * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: crypto: memzero_explicit - make sure to clear out sensitive data random: add and use memzero_explicit() for clearing data
| * | random: add and use memzero_explicit() for clearing dataDaniel Borkmann2014-10-171-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zatimend has reported that in his environment (3.16/gcc4.8.3/corei7) memset() calls which clear out sensitive data in extract_{buf,entropy, entropy_user}() in random driver are being optimized away by gcc. Add a helper memzero_explicit() (similarly as explicit_bzero() variants) that can be used in such cases where a variable with sensitive data is being cleared out in the end. Other use cases might also be in crypto code. [ I have put this into lib/string.c though, as it's always built-in and doesn't need any dependencies then. ] Fixes kernel bugzilla: 82041 Reported-by: zatimend@hotmail.co.uk Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | | Merge branch 'x86-efi-for-linus' of ↵Linus Torvalds2014-10-231-0/+29
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI updates from Peter Anvin: "This patchset falls under the "maintainers that grovel" clause in the v3.18-rc1 announcement. We had intended to push it late in the merge window since we got it into the -tip tree relatively late. Many of these are relatively simple things, but there are a couple of key bits, especially Ard's and Matt's patches" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) rtc: Disable EFI rtc for x86 efi: rtc-efi: Export platform:rtc-efi as module alias efi: Delete the in_nmi() conditional runtime locking efi: Provide a non-blocking SetVariable() operation x86/efi: Adding efi_printks on memory allocationa and pci.reads x86/efi: Mark initialization code as such x86/efi: Update comment regarding required phys mapped EFI services x86/efi: Unexport add_efi_memmap variable x86/efi: Remove unused efi_call* macros efi: Resolve some shadow warnings arm64: efi: Format EFI memory type & attrs with efi_md_typeattr_format() ia64: efi: Format EFI memory type & attrs with efi_md_typeattr_format() x86: efi: Format EFI memory type & attrs with efi_md_typeattr_format() efi: Introduce efi_md_typeattr_format() efi: Add macro for EFI_MEMORY_UCE memory attribute x86/efi: Clear EFI_RUNTIME_SERVICES if failing to enter virtual mode arm64/efi: Do not enter virtual mode if booting with efi=noruntime or noefi arm64/efi: uefi_init error handling fix efi: Add kernel param efi=noruntime lib: Add a generic cmdline parse function parse_option_str ...
| * | Merge branch 'next' into efi-next-mergeMatt Fleming2014-10-031-0/+29
| |\ \ | | | | | | | | | | | | | | | | Conflicts: arch/x86/boot/compressed/eboot.c
| | * | lib: Add a generic cmdline parse function parse_option_strDave Young2014-10-031-0/+29
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There should be a generic function to parse params like a=b,c Adding parse_option_str in lib/cmdline.c which will return true if there's specified option set in the params. Also updated efi=old_map parsing code to use the new function Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
* | | Merge tag 'fbdev-3.18' of ↵Linus Torvalds2014-10-184-0/+3100
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux Pull fbdev updates from Tomi Valkeinen: - new 6x10 font - various small fixes and cleanups * tag 'fbdev-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (30 commits) fonts: Add 6x10 font videomode: provide dummy inline functions for !CONFIG_OF video/atmel_lcdfb: Introduce regulator support fbdev: sh_mobile_hdmi: Re-init regs before irq re-enable on resume framebuffer: fix screen corruption when copying framebuffer: fix border color arm, fbdev, omap2, LLVMLinux: Remove nested function from omapfb arm, fbdev, omap2, LLVMLinux: Remove nested function from omap2 dss video: fbdev: valkyriefb.c: use container_of to resolve fb_info_valkyrie from fb_info video: fbdev: pxafb.c: use container_of to resolve pxafb_info/layer from fb_info video: fbdev: cyber2000fb.c: use container_of to resolve cfb_info from fb_info video: fbdev: controlfb.c: use container_of to resolve fb_info_control from fb_info video: fbdev: sa1100fb.c: use container_of to resolve sa1100fb_info from fb_info video: fbdev: stifb.c: use container_of to resolve stifb_info from fb_info video: fbdev: sis: sis_main.c: Cleaning up missing null-terminate in conjunction with strncpy video: valkyriefb: Fix unused variable warning in set_valkyrie_clock() video: fbdev: use %*ph specifier to dump small buffers video: mx3fb: always enable BACKLIGHT_LCD_SUPPORT video: fbdev: au1200fb: delete double assignment video: fbdev: sis: delete double assignment ...
| * | | fonts: Add 6x10 fontMaarten ter Huurne2014-10-094-0/+3100
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This font is suitable for framebuffer consoles on devices with a 320x240 screen, to get a reasonable number of characters (53x24) that are still at a readable size. The font is derived from the existing 6x11 font, but gets 3 extra lines without sacrificing readability. Also I redesigned a some glyhps so they are more distinct and better fill the available space. Signed-off-by: Maarten ter Huurne <maarten@treewalker.org> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
* | | Merge tag 'md/3.18' of git://neil.brown.name/mdLinus Torvalds2014-10-181-6/+6
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull md updates from Neil Brown: - a few minor bug fixes - quite a lot of code tidy-up and simplification - remove PRINT_RAID_DEBUG ioctl. I'm fairly sure it is unused, and it isn't particularly useful. * tag 'md/3.18' of git://neil.brown.name/md: (21 commits) lib/raid6: Add log level to printks md: move EXPORT_SYMBOL to after function in md.c md: discard PRINT_RAID_DEBUG ioctl md: remove MD_BUG() md: clean up 'exit' labels in md_ioctl(). md: remove unnecessary test for MD_MAJOR in md_ioctl() md: don't allow "-sync" to be set for device in an active array. md: remove unwanted white space from md.c md: don't start resync thread directly from md thread. md: Just use RCU when checking for overlap between arrays. md: avoid potential long delay under pers_lock md: simplify export_array() md: discard find_rdev_nr in favour of find_rdev_nr_rcu md: use wait_event() to simplify md_super_wait() md: be more relaxed about stopping an array which isn't started. md/raid1: process_checks doesn't use its return value. md/raid5: fix init_stripe() inconsistencies md/raid10: another memory leak due to reshape. md: use set_bit/clear_bit instead of shift/mask for bi_flags changes. md/raid1: minor typos and reformatting. ...
| * | | lib/raid6: Add log level to printksAnton Blanchard2014-10-141-6/+6
| | |/ | |/| | | | | | | | | | Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: NeilBrown <neilb@suse.de>
* | | crypto: LLVMLinux: Remove VLAIS usage from libcrc32c.cJan-Simon Möller2014-10-141-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99 compliant equivalent. This patch allocates the appropriate amount of memory using a char array using the SHASH_DESC_ON_STACK macro. The new code can be compiled with both gcc and clang. Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de> Signed-off-by: Behan Webster <behanw@converseincode.com> Reviewed-by: Mark Charlebois <charlebm@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: pageexec@freemail.hu Cc: "David S. Miller" <davem@davemloft.net>
* | | lib/vsprintf: add %*pE[achnops] format specifierAndy Shevchenko2014-10-141-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows user to print a given buffer as an escaped string. The rules are applied according to an optional mix of flags provided by additional format letters. For example, if the given buffer is: 1b 62 20 5c 43 07 22 90 0d 5d The result strings would be: %*pE "\eb \C\a"\220\r]" %*pEhp "\x1bb \C\x07"\x90\x0d]" %*pEa "\e\142\040\\\103\a\042\220\r\135" Please, read Documentation/printk-formats.txt and lib/string_helpers.c kernel documentation to get further information. [akpm@linux-foundation.org: tidy up comment layout, per Joe] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Suggested-by: Joe Perches <joe@perches.com> Cc: "John W . Linville" <linville@tuxdriver.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | lib / string_helpers: introduce string_escape_mem()Andy Shevchenko2014-10-142-4/+510
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is almost the opposite function to string_unescape(). Nevertheless it handles \0 and could be used for any byte buffer. The documentation is supplied together with the function prototype. The test cases covers most of the scenarios and would be expanded later on. [akpm@linux-foundation.org: avoid 1k stack consumption] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: "John W . Linville" <linville@tuxdriver.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joe Perches <joe@perches.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | lib / string_helpers: refactoring the test suiteAndy Shevchenko2014-10-141-12/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch prepares test suite for a following update. It introduces test_string_check_buf() helper which checks the result and dumps an error. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: "John W . Linville" <linville@tuxdriver.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | lib / string_helpers: move documentation to c-fileAndy Shevchenko2014-10-141-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The introduced function string_escape_mem() is a kind of opposite to string_unescape. We have several users of such functionality each of them created custom implementation. The series contains clean up of test suite, adding new call, and switching few users to use it via %*pE specifier. Test suite covers all of existing and most of potential use cases. This patch (of 11): The documentation of API belongs to c-file. This patch moves it accordingly. There is no functional change. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: "John W . Linville" <linville@tuxdriver.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | lib: string: Make all calls to strnicmp into calls to strncasecmpRasmus Villemoes2014-10-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous patch made strnicmp into a wrapper for strncasecmp. This patch makes all in-tree users of strnicmp call strncasecmp directly, while still making sure that the strnicmp symbol can be used by out-of-tree modules. It should be considered a temporary hack until all in-tree callers have been converted. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | lib/string.c: remove duplicated functionRasmus Villemoes2014-10-141-17/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lib/string.c contains two functions, strnicmp and strncasecmp, which do roughly the same thing, namely compare two strings case-insensitively up to a given bound. They have slightly different implementations, but the only important difference is that strncasecmp doesn't handle len==0 appropriately; it effectively becomes strcasecmp in that case. strnicmp correctly says that two strings are always equal in their first 0 characters. strncasecmp is the POSIX name for this functionality. So rename the non-broken function to the standard name. To minimize the impact on the rest of the kernel (and since both are exported to modules), make strnicmp a wrapper for strncasecmp. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Grant Likely <grant.likely@linaro.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | lib: rename TEST_MODULE to TEST_LKMValentin Rothberg2014-10-142-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "_MODULE" suffix is reserved for tristates compiled as loadable kernel modules (LKM). The "TEST_MODULE" feature thereby violates this convention. The feature is used to compile the lib/test_module.c kernel module. Sadly this convention is not made explicit, but the Kconfig code documents it. The following code (./scripts/kconfig/confdata.c) is used to generate the autoconf.h header file during the build process. When a feature is selected as a kernel module ('m'), it is suffixed with "_MODULE" to indicate it. switch (*value) { case 'n': break; case 'm': suffix = "_MODULE"; /* fall through */ This causes problems for static code analysis, which assumes a consistent use of the "_MODULE" suffix. This patch renames the feature and its reference in a Makefile to "TEST_LKM", which still expresses the test of a LKM. Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | lib: remove prio_heapLai Jiangshan2014-10-142-71/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The prio_heap code is unused since commit 889ed9ceaa97 ("cgroup: remove css_scan_tasks()"). It should be compiled out to shrink the binary kernel size which can be done via introducing CONFIG_PRIO_HEAD or by removing the code. We can simply recover the code from git when needed, so it would be better to remove it IMO. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Francesco Fusco <ffusco@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: George Spelvin <linux@horizon.com> Cc: Mark Salter <msalter@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | lib/textsearch.c: remove textsearch_put reference from commentsRaphael Silva2014-10-141-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | There is no textsearch_put(). Remove it from the comments to avoid misunderstanding. Textsearch prepare no longer needs textsearch_put(). Signed-off-by: Raphael Silva <rapphil@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | lib/dynamic_debug.c: use seq_open_private() instead of seq_open()Rob Jones2014-10-141-15/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using seq_open_private() removes boilerplate code from ddebug_proc_open(). The resultant code is shorter and easier to follow. This patch does not change any functionality. Signed-off-by: Rob Jones <rob.jones@codethink.co.uk> Acked-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2014-10-131-0/+12
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave Hansen) - Various sched/idle refinements for better idle handling (Nicolas Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot) - sched/numa updates and optimizations (Rik van Riel) - sysbench speedup (Vincent Guittot) - capacity calculation cleanups/refactoring (Vincent Guittot) - Various cleanups to thread group iteration (Oleg Nesterov) - Double-rq-lock removal optimization and various refactorings (Kirill Tkhai) - various sched/deadline fixes ... and lots of other changes" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) sched/dl: Use dl_bw_of() under rcu_read_lock_sched() sched/fair: Delete resched_cpu() from idle_balance() sched, time: Fix build error with 64 bit cputime_t on 32 bit systems sched: Improve sysbench performance by fixing spurious active migration sched/x86: Fix up typo in topology detection x86, sched: Add new topology for multi-NUMA-node CPUs sched/rt: Use resched_curr() in task_tick_rt() sched: Use rq->rd in sched_setaffinity() under RCU read lock sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask' sched: Use dl_bw_of() under RCU read lock sched/fair: Remove duplicate code from can_migrate_task() sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW sched: print_rq(): Don't use tasklist_lock sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock() sched: Fix the task-group check in tg_has_rt_tasks() sched/fair: Leverage the idle state info when choosing the "idlest" cpu sched: Let the scheduler see CPU idle states sched/deadline: Fix inter- exclusive cpusets migrations sched/deadline: Clear dl_entity params when setscheduling to different class sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault() ...
| * | | sched: Add default-disabled option to BUG() when stack end location is ↵Aaron Tomlin2014-09-191-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | overwritten Currently in the event of a stack overrun a call to schedule() does not check for this type of corruption. This corruption is often silent and can go unnoticed. However once the corrupted region is examined at a later stage, the outcome is undefined and often results in a sporadic page fault which cannot be handled. This patch checks for a stack overrun and takes appropriate action since the damage is already done, there is no point in continuing. Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dzickus@redhat.com Cc: bmr@redhat.com Cc: jcastillo@redhat.com Cc: oleg@redhat.com Cc: riel@redhat.com Cc: prarit@redhat.com Cc: jgh@redhat.com Cc: minchan@kernel.org Cc: mpe@ellerman.id.au Cc: tglx@linutronix.de Cc: rostedt@goodmis.org Cc: hannes@cmpxchg.org Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: David S. Miller <davem@davemloft.net> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lubomir Rintel <lkundrak@v3.sk> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1410527779-8133-4-git-send-email-atomlin@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge branch 'locking-core-for-linus' of ↵Linus Torvalds2014-10-131-2/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The main updates in this cycle were: - mutex MCS refactoring finishing touches: improve comments, refactor and clean up code, reduce debug data structure footprint, etc. - qrwlock finishing touches: remove old code, self-test updates. - small rwsem optimization - various smaller fixes/cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Revert qrwlock recusive stuff locking/rwsem: Avoid double checking before try acquiring write lock locking/rwsem: Move EXPORT_SYMBOL() lines to follow function definition locking/rwlock, x86: Delete unused asm/rwlock.h and rwlock.S locking/rwlock, x86: Clean up asm/spinlock*.h to remove old rwlock code locking/semaphore: Resolve some shadow warnings locking/selftest: Support queued rwlock locking/lockdep: Restrict the use of recursive read_lock() with qrwlock locking/spinlocks: Always evaluate the second argument of spin_lock_nested() locking/Documentation: Update locking/mutex-design.txt disadvantages locking/Documentation: Move locking related docs into Documentation/locking/ locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriate locking/mutexes: Refactor optimistic spinning code locking/mcs: Remove obsolete comment locking/mutexes: Document quick lock release when unlocking locking/mutexes: Standardize arguments in lock/unlock slowpaths locking: Remove deprecated smp_mb__() barriers
| * | | | locking/lockdep: Revert qrwlock recusive stuffPeter Zijlstra2014-10-031-49/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f0bab73cb539 ("locking/lockdep: Restrict the use of recursive read_lock() with qrwlock") changed lockdep to try and conform to the qrwlock semantics which differ from the traditional rwlock semantics. In particular qrwlock is fair outside of interrupt context, but in interrupt context readers will ignore all fairness. The problem modeling this is that read and write side have different lock state (interrupts) semantics but we only have a single representation of these. Therefore lockdep will get confused, thinking the lock can cause interrupt lock inversions. So revert it for now; the old rwlock semantics were already imperfectly modeled and the qrwlock extra won't fit either. If we want to properly fix this, I think we need to resurrect the work by Gautham did a few years ago that split the read and write state of locks: http://lwn.net/Articles/332801/ FWIW the locking selftest that would've failed (and was reported by Borislav earlier) is something like: RL(X1); /* IRQ-ON */ LOCK(A); UNLOCK(A); RU(X1); IRQ_ENTER(); RL(X1); /* IN-IRQ */ RU(X1); IRQ_EXIT(); At which point it would report that because A is an IRQ-unsafe lock we can suffer the following inversion: CPU0 CPU1 lock(A) lock(X1) lock(A) <IRQ> lock(X1) And this is 'wrong' because X1 can recurse (assuming the above lock are in fact read-lock) but lockdep doesn't know about this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Waiman Long <Waiman.Long@hp.com> Cc: ego@linux.vnet.ibm.com Cc: bp@alien8.de Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20140930132600.GA7444@worktop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/selftest: Support queued rwlockWaiman Long2014-08-131-7/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The queued rwlock does not support the use of recursive read-lock in the process context. With changes in the lockdep code to check and disallow recursive read-lock, it is also necessary for the locking selftest to be updated to change the process context recursive read locking results from SUCCESS to FAILURE for rwlock. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1407345722-61615-3-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/Documentation: Move locking related docs into Documentation/locking/Davidlohr Bueso2014-08-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Specifically: Documentation/locking/lockdep-design.txt Documentation/locking/lockstat.txt Documentation/locking/mutex-design.txt Documentation/locking/rt-mutex-design.txt Documentation/locking/rt-mutex.txt Documentation/locking/spinlocks.txt Documentation/locking/ww-mutex-design.txt Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: jason.low2@hp.com Cc: aswin@hp.com Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Mason <clm@fb.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: David Airlie <airlied@linux.ie> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jason Low <jason.low2@hp.com> Cc: Josef Bacik <jbacik@fusionio.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lubomir Rintel <lkundrak@v3.sk> Cc: Masanari Iida <standby24x7@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: fengguang.wu@intel.com Link: http://lkml.kernel.org/r/1406752916-3341-6-git-send-email-davidlohr@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'locking-arch-for-linus' of ↵Linus Torvalds2014-10-131-47/+36
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull arch atomic cleanups from Ingo Molnar: "This is a series kept separate from the main locking tree, which cleans up and improves various details in the atomics type handling: - Remove the unused atomic_or_long() method - Consolidate and compress atomic ops implementations between architectures, to reduce linecount and to make it easier to add new ops. - Rewrite generic atomic support to only require cmpxchg() from an architecture - generate all other methods from that" * 'locking-arch-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read() locking, mips: Fix atomics locking, sparc64: Fix atomics locking,arch: Rewrite generic atomic support locking,arch,xtensa: Fold atomic_ops locking,arch,sparc: Fold atomic_ops locking,arch,sh: Fold atomic_ops locking,arch,powerpc: Fold atomic_ops locking,arch,parisc: Fold atomic_ops locking,arch,mn10300: Fold atomic_ops locking,arch,mips: Fold atomic_ops locking,arch,metag: Fold atomic_ops locking,arch,m68k: Fold atomic_ops locking,arch,m32r: Fold atomic_ops locking,arch,ia64: Fold atomic_ops locking,arch,hexagon: Fold atomic_ops locking,arch,cris: Fold atomic_ops locking,arch,avr32: Fold atomic_ops locking,arch,arm64: Fold atomic_ops locking,arch,arm: Fold atomic_ops ...
| * | | | | locking,arch: Rewrite generic atomic supportPeter Zijlstra2014-08-141-47/+36
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite generic atomic support to only require cmpxchg(), generate all other primitives from that. Furthermore reduce the endless repetition for all these primitives to a few CPP macros. This way we get more for less lines. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140508135852.940119622@infradead.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Howells <dhowells@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'next' of ↵Linus Torvalds2014-10-121-0/+16
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris. Mostly ima, selinux, smack and key handling updates. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits) integrity: do zero padding of the key id KEYS: output last portion of fingerprint in /proc/keys KEYS: strip 'id:' from ca_keyid KEYS: use swapped SKID for performing partial matching KEYS: Restore partial ID matching functionality for asymmetric keys X.509: If available, use the raw subjKeyId to form the key description KEYS: handle error code encoded in pointer selinux: normalize audit log formatting selinux: cleanup error reporting in selinux_nlmsg_perm() KEYS: Check hex2bin()'s return when generating an asymmetric key ID ima: detect violations for mmaped files ima: fix race condition on ima_rdwr_violation_check and process_measurement ima: added ima_policy_flag variable ima: return an error code from ima_add_boot_aggregate() ima: provide 'ima_appraise=log' kernel option ima: move keyring initialization to ima_init() PKCS#7: Handle PKCS#7 messages that contain no X.509 certs PKCS#7: Better handling of unsupported crypto KEYS: Overhaul key identification when searching for asymmetric keys KEYS: Implement binary asymmetric key ID handling ...
| * \ \ \ \ Merge commit 'v3.16' into nextJames Morris2014-10-011-1/+1
| |\ \ \ \ \
| * | | | | | Provide a binary to hex conversion functionDavid Howells2014-09-161-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a function to convert a buffer of binary data into an unterminated ascii hex string representation of that data. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* | | | | | | Merge branch 'for-3.18' of ↵Linus Torvalds2014-10-104-105/+238
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: "A lot of activities on percpu front. Notable changes are... - percpu allocator now can take @gfp. If @gfp doesn't contain GFP_KERNEL, it tries to allocate from what's already available to the allocator and a work item tries to keep the reserve around certain level so that these atomic allocations usually succeed. This will replace the ad-hoc percpu memory pool used by blk-throttle and also be used by the planned blkcg support for writeback IOs. Please note that I noticed a bug in how @gfp is interpreted while preparing this pull request and applied the fix 6ae833c7fe0c ("percpu: fix how @gfp is interpreted by the percpu allocator") just now. - percpu_ref now uses longs for percpu and global counters instead of ints. It leads to more sparse packing of the percpu counters on 64bit machines but the overhead should be negligible and this allows using percpu_ref for refcnting pages and in-memory objects directly. - The switching between percpu and single counter modes of a percpu_ref is made independent of putting the base ref and a percpu_ref can now optionally be initialized in single or killed mode. This allows avoiding percpu shutdown latency for cases where the refcounted objects may be synchronously created and destroyed in rapid succession with only a fraction of them reaching fully operational status (SCSI probing does this when combined with blk-mq support). It's also planned to be used to implement forced single mode to detect underflow more timely for debugging. There's a separate branch percpu/for-3.18-consistent-ops which cleans up the duplicate percpu accessors. That branch causes a number of conflicts with s390 and other trees. I'll send a separate pull request w/ resolutions once other branches are merged" * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits) percpu: fix how @gfp is interpreted by the percpu allocator blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky percpu_ref: add PERCPU_REF_INIT_* flags percpu_ref: decouple switching to percpu mode and reinit percpu_ref: decouple switching to atomic mode and killing percpu_ref: add PCPU_REF_DEAD percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch percpu_ref: replace pcpu_ prefix with percpu_ percpu_ref: minor code and comment updates percpu_ref: relocate percpu_ref_reinit() Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe" Revert "percpu: free percpu allocation info for uniprocessor system" percpu-refcount: make percpu_ref based on longs instead of ints percpu-refcount: improve WARN messages percpu: fix locking regression in the failure path of pcpu_alloc() percpu-refcount: add @gfp to percpu_ref_init() proportions: add @gfp to init functions percpu_counter: add @gfp to percpu_counter_init() percpu_counter: make percpu_counters_lock irq-safe ...
| * | | | | | | percpu_ref: make INIT_ATOMIC and switch_to_atomic() stickyTejun Heo2014-09-241-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, a percpu_ref which is initialized with PERPCU_REF_INIT_ATOMIC or switched to atomic mode via switch_to_atomic() automatically reverts to percpu mode on the first percpu_ref_reinit(). This makes the atomic mode difficult to use for cases where a percpu_ref is used as a persistent on/off switch which may be cycled multiple times. This patch makes such atomic state sticky so that it survives through kill/reinit cycles. After this patch, atomic state is cleared only by an explicit percpu_ref_switch_to_percpu() call. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org>
| * | | | | | | percpu_ref: add PERCPU_REF_INIT_* flagsTejun Heo2014-09-241-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the recent addition of percpu_ref_reinit(), percpu_ref now can be used as a persistent switch which can be turned on and off repeatedly where turning off maps to killing the ref and waiting for it to drain; however, there currently isn't a way to initialize a percpu_ref in its off (killed and drained) state, which can be inconvenient for certain persistent switch use cases. Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic selection of operation mode; however, currently a newly initialized percpu_ref is always in percpu mode making it impossible to avoid the latency overhead of switching to atomic mode. This patch adds @flags to percpu_ref_init() and implements the following flags. * PERCPU_REF_INIT_ATOMIC : start ref in atomic mode * PERCPU_REF_INIT_DEAD : start ref killed and drained These flags should be able to serve the above two use cases. v2: target_core_tpg.c conversion was missing. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org>
| * | | | | | | percpu_ref: decouple switching to percpu mode and reinitTejun Heo2014-09-241-19/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu_ref has treated the dropping of the base reference and switching to atomic mode as an integral operation; however, there's nothing inherent tying the two together. The use cases for percpu_ref have been expanding continuously. While the current init/kill/reinit/exit model can cover a lot, the coupling of kill/reinit with atomic/percpu mode switching is turning out to be too restrictive for use cases where many percpu_refs are created and destroyed back-to-back with only some of them reaching extended operation. The coupling also makes implementing always-atomic debug mode difficult. This patch separates out percpu mode switching into percpu_ref_switch_to_percpu() and reimplements percpu_ref_reinit() on top of it. * DEAD still requires ATOMIC. A dead ref can't be switched to percpu mode w/o going through reinit. v2: __percpu_ref_switch_to_percpu() was missing static. Fixed. Reported by Fengguang aka kbuild test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: kbuild test robot <fengguang.wu@intel.com>
| * | | | | | | percpu_ref: decouple switching to atomic mode and killingTejun Heo2014-09-241-31/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu_ref has treated the dropping of the base reference and switching to atomic mode as an integral operation; however, there's nothing inherent tying the two together. The use cases for percpu_ref have been expanding continuously. While the current init/kill/reinit/exit model can cover a lot, the coupling of kill/reinit with atomic/percpu mode switching is turning out to be too restrictive for use cases where many percpu_refs are created and destroyed back-to-back with only some of them reaching extended operation. The coupling also makes implementing always-atomic debug mode difficult. This patch separates out atomic mode switching into percpu_ref_switch_to_atomic() and reimplements percpu_ref_kill_and_confirm() on top of it. * The handling of __PERCPU_REF_ATOMIC and __PERCPU_REF_DEAD is now differentiated. Among get/put operations, percpu_ref_tryget_live() is the only one which cares about DEAD. * percpu_ref_switch_to_atomic() can be called multiple times on the same ref. This means that multiple @confirm_switch may get queued up which we can't do reliably without extra memory area. This is handled by making the later invocation synchronously wait for the completion of the previous one. This isn't particularly desirable but such synchronous waits shouldn't happen in most cases. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org>
| * | | | | | | percpu_ref: add PCPU_REF_DEADTejun Heo2014-09-241-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu_ref will be restructured so that percpu/atomic mode switching and reference killing are dedoupled. In preparation, add PCPU_REF_DEAD and PCPU_REF_ATOMIC_DEAD which is OR of ATOMIC and DEAD. For now, ATOMIC and DEAD are changed together and all PCPU_REF_ATOMIC uses are converted to PCPU_REF_ATOMIC_DEAD without causing any behavior changes. percpu_ref_init() now specifies an explicit alignment when allocating the percpu counters so that the pointer has enough unused low bits to accomodate the flags. Note that one flag was fine as min alignment for percpu memory is 2 bytes but two flags are already too many for the natural alignment of unsigned longs on archs like cris and m68k. v2: The original patch had BUILD_BUG_ON() which triggers if unsigned long's alignment isn't enough to accomodate the flags, which triggered on cris and m64k. percpu_ref_init() updated to specify the required alignment explicitly. Reported by Fengguang. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <kmo@daterainc.com> Cc: kbuild test robot <fengguang.wu@intel.com>
| * | | | | | | percpu_ref: rename things to prepare for decoupling percpu/atomic mode switchTejun Heo2014-09-241-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu_ref will be restructured so that percpu/atomic mode switching and reference killing are dedoupled. In preparation, do the following renames. * percpu_ref->confirm_kill -> percpu_ref->confirm_switch * __PERCPU_REF_DEAD -> __PERCPU_REF_ATOMIC * __percpu_ref_alive() -> __ref_is_percpu() This patch is pure rename and doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <kmo@daterainc.com>
| * | | | | | | percpu_ref: replace pcpu_ prefix with percpu_Tejun Heo2014-09-241-27/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu_ref uses pcpu_ prefix for internal stuff and percpu_ for externally visible ones. This is the same convention used in the percpu allocator implementation. It works fine there but percpu_ref doesn't have too much internal-only stuff and scattered usages of pcpu_ prefix are confusing than helpful. This patch replaces all pcpu_ prefixes with percpu_. This is pure rename and there's no functional change. Note that PCPU_REF_DEAD is renamed to __PERCPU_REF_DEAD to signify that the flag is internal. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <kmo@daterainc.com>
| * | | | | | | percpu_ref: minor code and comment updatesTejun Heo2014-09-241-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Some comments became stale. Updated. * percpu_ref_tryget() unnecessarily initializes @ret. Removed. * A blank line removed from percpu_ref_kill_rcu(). * Explicit function name in a WARN format string replaced with __func__. * WARN_ON() in percpu_ref_reinit() converted to WARN_ON_ONCE(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <kmo@daterainc.com>
| * | | | | | | percpu_ref: relocate percpu_ref_reinit()Tejun Heo2014-09-241-35/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu_ref is gonna go through restructuring. Move percpu_ref_reinit() after percpu_ref_kill_and_confirm(). This will make later changes easier to follow and result in cleaner organization. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <kmo@daterainc.com>
| * | | | | | | Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during ↵Tejun Heo2014-09-241-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | probe" This reverts commit 0a30288da1aec914e158c2d7a3482a85f632750f, which was a temporary fix for SCSI blk-mq stall issue. The following patches will fix the issue properly by introducing atomic mode to percpu_ref. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de>
| * | | | | | | Merge branch 'for-linus' of ↵Tejun Heo2014-09-247-8/+37
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18 This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe") which implements __percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The commit reverted and patches to implement proper fix will be added. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de>
| * | | | | | | | percpu-refcount: make percpu_ref based on longs instead of intsTejun Heo2014-09-201-18/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu_ref is currently based on ints and the number of refs it can cover is (1 << 31). This makes it impossible to use a percpu_ref to count memory objects or pages on 64bit machines as it may overflow. This forces those users to somehow aggregate the references before contributing to the percpu_ref which is often cumbersome and sometimes challenging to get the same level of performance as using the percpu_ref directly. While using ints for the percpu counters makes them pack tighter on 64bit machines, the possible gain from using ints instead of longs is extremely small compared to the overall gain from per-cpu operation. This patch makes percpu_ref based on longs so that it can be used to directly count memory objects or pages. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Johannes Weiner <hannes@cmpxchg.org>
| * | | | | | | | percpu-refcount: improve WARN messagesTejun Heo2014-09-201-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu_ref's WARN messages can be a lot more helpful by indicating who's the culprit. Make them report the release function that the offending percpu-refcount is associated with. This should make it a lot easier to track down the reported invalid refcnting operations. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com>
| * | | | | | | | percpu-refcount: add @gfp to percpu_ref_init()Tejun Heo2014-09-081-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Percpu allocator now supports allocation mask. Add @gfp to percpu_ref_init() so that !GFP_KERNEL allocation masks can be used with percpu_refs too. This patch doesn't make any functional difference. v2: blk-mq conversion was missing. Updated. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <koverstreet@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Jens Axboe <axboe@kernel.dk>