summaryrefslogtreecommitdiffstats
path: root/include/linux
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'hash' of git://ftp.sciencehorizons.net/linuxLinus Torvalds2016-05-284-110/+141
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull string hash improvements from George Spelvin: "This series does several related things: - Makes the dcache hash (fs/namei.c) useful for general kernel use. (Thanks to Bruce for noticing the zero-length corner case) - Converts the string hashes in <linux/sunrpc/svcauth.h> to use the above. - Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two 32-bit multiplies will do well enough. - Rids the world of the bad hash multipliers in hash_32. This finishes the job started in commit 689de1d6ca95 ("Minimal fix-up of bad hashing behavior of hash_64()") The vast majority of Linux architectures have hardware support for 32x32-bit multiply and so derive no benefit from "simplified" multipliers. The few processors that do not (68000, h8/300 and some models of Microblaze) have arch-specific implementations added. Those patches are last in the series. - Overhauls the dcache hash mixing. The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion. Replaced with a much more careful design that's simultaneously faster and better. (My own invention, as there was noting suitable in the literature I could find. Comments welcome!) - Modify the hash_name() loop to skip the initial HASH_MIX(). This would let us salt the hash if we ever wanted to. - Sort out partial_name_hash(). The hash function is declared as using a long state, even though it's truncated to 32 bits at the end and the extra internal state contributes nothing to the result. And some callers do odd things: - fs/hfs/string.c only allocates 32 bits of state - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes - Modify bytemask_from_count to handle inputs of 1..sizeof(long) rather than 0..sizeof(long)-1. This would simplify users other than full_name_hash" Special thanks to Bruce Fields for testing and finding bugs in v1. (I learned some humbling lessons about "obviously correct" code.) On the arch-specific front, the m68k assembly has been tested in a standalone test harness, I've been in contact with the Microblaze maintainers who mostly don't care, as the hardware multiplier is never omitted in real-world applications, and I haven't heard anything from the H8/300 world" * 'hash' of git://ftp.sciencehorizons.net/linux: h8300: Add <asm/hash.h> microblaze: Add <asm/hash.h> m68k: Add <asm/hash.h> <linux/hash.h>: Add support for architecture-specific functions fs/namei.c: Improve dcache hash function Eliminate bad hash multipliers from hash_32() and hash_64() Change hash_64() return value to 32 bits <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() fs/namei.c: Add hashlen_string() function Pull out string hash to <linux/stringhash.h>
| * <linux/hash.h>: Add support for architecture-specific functionsGeorge Spelvin2016-05-281-3/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is just the infrastructure; there are no users yet. This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares the existence of <asm/hash.h>. That file may define its own versions of various functions, and define HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones. Included is a self-test (in lib/test_hash.c) that verifies the basics. It is NOT in general required that the arch-specific functions compute the same thing as the generic, but if a HAVE_* symbol is defined with the value 1, then equality is tested. Signed-off-by: George Spelvin <linux@sciencehorizons.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Philippe De Muyter <phdm@macq.eu> Cc: linux-m68k@lists.linux-m68k.org Cc: Alistair Francis <alistai@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp
| * Eliminate bad hash multipliers from hash_32() and hash_64()George Spelvin2016-05-281-53/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "simplified" prime multipliers made very bad hash functions, so get rid of them. This completes the work of 689de1d6ca. To avoid the inefficiency which was the motivation for the "simplified" multipliers, hash_64() on 32-bit systems is changed to use a different algorithm. It makes two calls to hash_32() instead. drivers/media/usb/dvb-usb-v2/af9015.c uses the old GOLDEN_RATIO_PRIME_32 for some horrible reason, so it inherits a copy of the old definition. Signed-off-by: George Spelvin <linux@sciencehorizons.net> Cc: Antti Palosaari <crope@iki.fi> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
| * Change hash_64() return value to 32 bitsGeorge Spelvin2016-05-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | That's all that's ever asked for, and it makes the return type of hash_long() consistent. It also allows (upcoming patch) an optimized implementation of hash_64 on 32-bit machines. I tried adding a BUILD_BUG_ON to ensure the number of bits requested was never more than 32 (most callers use a compile-time constant), but adding <linux/bug.h> to <linux/hash.h> breaks the tools/perf compiler unless tools/perf/MANIFEST is updated, and understanding that code base well enough to update it is too much trouble. I did the rest of an allyesconfig build with such a check, and nothing tripped. Signed-off-by: George Spelvin <linux@sciencehorizons.net>
| * <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()George Spelvin2016-05-281-31/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Finally, the first use of previous two patches: eliminate the separate ad-hoc string hash functions in the sunrpc code. Now hash_str() is a wrapper around hash_string(), and hash_mem() is likewise a wrapper around full_name_hash(). Note that sunrpc code *does* call hash_mem() with a zero length, which is why the previous patch needed to handle that in full_name_hash(). (Thanks, Bruce, for finding that!) This also eliminates the only caller of hash_long which asks for more than 32 bits of output. The comment about the quality of hashlen_string() and full_name_hash() is jumping the gun by a few patches; they aren't very impressive now, but will be improved greatly later in the series. Signed-off-by: George Spelvin <linux@sciencehorizons.net> Tested-by: J. Bruce Fields <bfields@redhat.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: linux-nfs@vger.kernel.org
| * fs/namei.c: Add hashlen_string() functionGeorge Spelvin2016-05-281-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'd like to make more use of the highly-optimized dcache hash functions throughout the kernel, rather than have every subsystem create its own, and a function that hashes basic null-terminated strings is required for that. (The name is to emphasize that it returns both hash and length.) It's actually useful in the dcache itself, specifically d_alloc_name(). Other uses in the next patch. full_name_hash() is also tweaked to make it more generally useful: 1) Take a "char *" rather than "unsigned char *" argument, to be consistent with hash_name(). 2) Handle zero-length inputs. If we want more callers, we don't want to make them worry about corner cases. Signed-off-by: George Spelvin <linux@sciencehorizons.net>
| * Pull out string hash to <linux/stringhash.h>George Spelvin2016-05-282-26/+73
| | | | | | | | | | | | | | | | ... so they can be used without the rest of <linux/dcache.h> The hashlen_* macros will make sense next patch. Signed-off-by: George Spelvin <linux@sciencehorizons.net>
* | Merge tag 'chrome-platform' of ↵Linus Torvalds2016-05-281-2/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform Pull chrome platform updates from Olof Johansson "A handful of Chrome driver and binding changes this merge window: - a few patches to fix probing and configuration of pstore - a few patches adding Elan touchpad registration on a few devices - EC changes: a security fix dealing with max message sizes and addition of compat_ioctl support. - keyboard backlight control support There was also an accidential duplicate registration of trackpads on 'Leon', which was reverted just recently" * tag 'chrome-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform: Revert "platform/chrome: chromeos_laptop: Add Leon Touch" platform/chrome: chromeos_laptop - Add Elan touchpad for Wolf platform/chrome: chromeos_laptop - Add elan trackpad option for C720 platform/chrome: cros_ec_dev - Populate compat_ioctl platform/chrome: cros_ec_lightbar - use name instead of ID to hide lightbar attributes platform/chrome: cros_ec_dev - Fix security issue platform/chrome: Add Chrome OS keyboard backlight LEDs support platform/chrome: use to_platform_device() platform/chrome: pstore: Move to larger record size. platform/chrome: pstore: probe for ramoops buffer using acpi platform/chrome: chromeos_laptop: Add Leon Touch
| * | platform/chrome: cros_ec_dev - Fix security issueGwendal Grignou2016-05-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent memory scribble by checking that ioctl buffer size parameters are sane. Without this check, on 32 bits system, if .insize = 0xffffffff - 20 and .outsize the amount to scribble, we would overflow, allocate a small amounts and be able to write outside of the malloc'ed area. Adding a hard limit allows argument checking of the ioctl. With the current EC, it is expected .insize and .outsize to be at around 512 bytes or less. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Signed-off-by: Olof Johansson <olof@lixom.net>
* | | Merge tag 'sound-4.7-rc1-2' of ↵Linus Torvalds2016-05-281-0/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull more sound updates from Takashi Iwai: "This is the second update round for 4.7-rc1. Most of changes are about the pending ASoC updates and fixes, including a few new drivers. Below are some highlights: ASoC: - New drivers for MAX98371 and TAS5720 - SPI support for TLV320AIC32x4, along with the module split - TDM support for STI Uniperf IPs - Remaining topology API fixes / updates HDA: - A couple of Dell quirks and new Realtek codec support" * tag 'sound-4.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (63 commits) ALSA: hda - Fix headset mic detection problem for one Dell machine spi: spi-ep93xx: Fix the PTR_ERR() argument ALSA: hda/realtek - Add support for ALC295/ALC3254 ASoC: kirkwood: fix build failure ALSA: hda - Fix headphone noise on Dell XPS 13 9360 ASoC: ak4642: Enable cache usage to fix crashes on resume ASoC: twl6040: Disconnect AUX output pads on digital mute ASoC: tlv320aic32x4: Properly implement the positive and negative pins into the mixers rcar: src: skip disabled-SRC nodes ASoC: max98371 Remove duplicate entry in max98371_reg ASoC: twl6040: Select LPPLL during standby ASoC: rsnd: don't use prohibited number to PDMACHCRn.SRS ASoC: simple-card: Add pm callbacks to platform driver ASoC: pxa: Fix module autoload for platform drivers ASoC: topology: Fix memory leak in widget creation ASoC: Add max98371 codec driver ASoC: rsnd: count .probe/.remove for rsnd_mod_call() ASoC: topology: Check size mismatch of ABI objects before parsing ASoC: topology: Check failure to create a widget ASoC: add support for TAS5720 digital amplifier ...
| | \ \
| | \ \
| *-. \ \ Merge remote-tracking branches 'asoc/fix/ak4642', 'asoc/fix/ep93xx', ↵Mark Brown2016-05-251-0/+1
| |\ \ \ \ | | | | | | | | | | | | | | | | | | 'asoc/fix/kirkwood' and 'asoc/fix/twl6040' into asoc-linus
| | | * | | ASoC: twl6040: Disconnect AUX output pads on digital mutePeter Ujfalusi2016-05-181-0/+1
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disconnect also the path to AUXL from the HF path during digital_mute to avoid pop noise leakage to Line-out pads. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Mark Brown <broonie@kernel.org>
| * | | / Merge tag 'asoc-v4.7' into asoc-linusMark Brown2016-05-251-0/+6
| |\| | | | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ASoC: Updates for v4.7 The updates this time around are almost all driver code: - Further slow progress on the topology code. - Substantial updates and improvements for the da7219, es8328, fsl-ssi Intel and rcar drivers. # gpg: Signature made Mon 16 May 2016 12:08:43 BST using RSA key ID 5D5487D0 # gpg: Good signature from "Mark Brown <broonie@sirena.org.uk>" # gpg: aka "Mark Brown <broonie@debian.org>" # gpg: aka "Mark Brown <broonie@kernel.org>" # gpg: aka "Mark Brown <broonie@tardis.ed.ac.uk>" # gpg: aka "Mark Brown <broonie@linaro.org>" # gpg: aka "Mark Brown <Mark.Brown@linaro.org>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 3F25 68AA C269 98F9 E813 A1C5 C3F4 36CA 30F5 D8EB # Subkey fingerprint: ADE6 68AA 6757 18B5 9FE2 9FEA 24D6 8B72 5D54 87D0
* | | | Merge branch 'i2c/for-next' of ↵Linus Torvalds2016-05-271-1/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull more i2c updates from Wolfram Sang: "Here is the second pull request from I2C for this merge window: - one new feature (which nearly fell through the cracks): i2c-dev does now use the cdev API so it can handle >256 minors. Seems people do need that. - two fixes for the just added DMA feature for i2c-rcar - some typo fixes" * 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: dev: don't start function name with 'return' i2c: dev: switch from register_chrdev to cdev API i2c: xlr: rename ARCH_TANGOX to ARCH_TANGO i2c: at91: change log when dma configuration fails misc: at24: Fix typo in at24 header file i2c: rcar: should depend on HAS_DMA i2c: rcar: use dma_request_chan()
| * | | | misc: at24: Fix typo in at24 header fileMoritz Fischer2016-05-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes a simple typo s/mvmem/nvmem in the example. Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2016-05-273-4/+8
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Followups to the parallel lookup work: - update docs - restore killability of the places that used to take ->i_mutex killably now that we have down_write_killable() merged - Additionally, it turns out that I missed a prerequisite for security_d_instantiate() stuff - ->getxattr() wasn't the only thing that could be called before dentry is attached to inode; with smack we needed the same treatment applied to ->setxattr() as well" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->setxattr() to passing dentry and inode separately switch xattr_handler->set() to passing dentry and inode separately restore killability of old mutex_lock_killable(&inode->i_mutex) users add down_write_killable_nested() update D/f/directory-locking
| * | | | | switch ->setxattr() to passing dentry and inode separatelyAl Viro2016-05-272-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | smack ->d_instantiate() uses ->setxattr(), so to be able to call it before we'd hashed the new dentry and attached it to inode, we need ->setxattr() instances getting the inode as an explicit argument rather than obtaining it from dentry. Similar change for ->getxattr() had been done in commit ce23e64. Unlike ->getxattr() (which is used by both selinux and smack instances of ->d_instantiate()) ->setxattr() is used only by smack one and unfortunately it got missed back then. Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | switch xattr_handler->set() to passing dentry and inode separatelyAl Viro2016-05-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | preparation for similar switch in ->setxattr() (see the next commit for rationale). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | add down_write_killable_nested()Al Viro2016-05-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | make IS_ERR_VALUE() complain about non-pointer-sized argumentsLinus Torvalds2016-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the allmodconfig x86-64 build is clean wrt IS_ERR_VALUE() uses on integers, add a cast to a pointer and back to the argument, so that any new mis-uses of IS_ERR_VALUE() will cause warnings like warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] so that we don't re-introduce any bogus uses. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | mm: remove more IS_ERR_VALUE abusesLinus Torvalds2016-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The do_brk() and vm_brk() return value was "unsigned long" and returned the starting address on success, and an error value on failure. The reasons are entirely historical, and go back to it basically behaving like the mmap() interface does. However, nobody actually wanted that interface, and it causes totally pointless IS_ERR_VALUE() confusion. What every single caller actually wants is just the simpler integer return of zero for success and negative error number on failure. So just convert to that much clearer and more common calling convention, and get rid of all the IS_ERR_VALUE() uses wrt vm_brk(). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | mm: fix section mismatch warningLinus Torvalds2016-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The register_page_bootmem_info_node() function needs to be marked __init in order to avoid a new warning introduced by commit f65e91df25aa ("mm: use early_pfn_to_nid in register_page_bootmem_info_node"). Otherwise you'll get a warning about how a non-init function calls early_pfn_to_nid (which is __meminit) Cc: Yang Shi <yang.shi@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge git://git.infradead.org/intel-iommuLinus Torvalds2016-05-271-2/+21
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull intel IOMMU updates from David Woodhouse: "This patchset improves the scalability of the Intel IOMMU code by resolving two spinlock bottlenecks and eliminating the linearity of the IOVA allocator, yielding up to ~5x performance improvement and approaching 'iommu=off' performance" * git://git.infradead.org/intel-iommu: iommu/vt-d: Use per-cpu IOVA caching iommu/iova: introduce per-cpu caching to iova allocation iommu/vt-d: change intel-iommu to use IOVA frame numbers iommu/vt-d: avoid dev iotlb logic for domains with no dev iotlbs iommu/vt-d: only unmap mapped entries iommu/vt-d: correct flush_unmaps pfn usage iommu/vt-d: per-cpu deferred invalidation queues iommu/vt-d: refactoring of deferred flush entries
| * | | | | | iommu/iova: introduce per-cpu caching to iova allocationOmer Peleg2016-04-201-2/+21
| | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IOVA allocation has two problems that impede high-throughput I/O. First, it can do a linear search over the allocated IOVA ranges. Second, the rbtree spinlock that serializes IOVA allocations becomes contended. Address these problems by creating an API for caching allocated IOVA ranges, so that the IOVA allocator isn't accessed frequently. This patch adds a per-CPU cache, from which CPUs can alloc/free IOVAs without taking the rbtree spinlock. The per-CPU caches are backed by a global cache, to avoid invoking the (linear-time) IOVA allocator without needing to make the per-CPU cache size excessive. This design is based on magazines, as described in "Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources" (currently available at https://www.usenix.org/legacy/event/usenix01/bonwick.html) Adding caching on top of the existing rbtree allocator maintains the property that IOVAs are densely packed in the IO virtual address space, which is important for keeping IOMMU page table usage low. To keep the cache size reasonable, we bound the IOVA space a CPU can cache by 32 MiB (we cache a bounded number of IOVA ranges, and only ranges of size <= 128 KiB). The shared global cache is bounded at 4 MiB of IOVA space. Signed-off-by: Omer Peleg <omer@cs.technion.ac.il> [mad@cs.technion.ac.il: rebased, cleaned up and reworded the commit message] Signed-off-by: Adam Morrison <mad@cs.technion.ac.il> Reviewed-by: Shaohua Li <shli@fb.com> Reviewed-by: Ben Serebrin <serebrin@google.com> [dwmw2: split out VT-d part into a separate patch] Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* | | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2016-05-273-0/+15
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull second batch of KVM updates from Radim Krčmář: "General: - move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat (kvm_stat had nothing to do with QEMU in the first place -- the tool only interprets debugfs) - expose per-vm statistics in debugfs and support them in kvm_stat (KVM always collected per-vm statistics, but they were summarised into global statistics) x86: - fix dynamic APICv (VMX was improperly configured and a guest could access host's APIC MSRs, CVE-2016-4440) - minor fixes ARM changes from Christoffer Dall: - new vgic reimplementation of our horribly broken legacy vgic implementation. The two implementations will live side-by-side (with the new being the configured default) for one kernel release and then we'll remove the legacy one. - fix for a non-critical issue with virtual abort injection to guests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (70 commits) tools: kvm_stat: Add comments tools: kvm_stat: Introduce pid monitoring KVM: Create debugfs dir and stat files for each VM MAINTAINERS: Add kvm tools tools: kvm_stat: Powerpc related fixes tools: Add kvm_stat man page tools: Add kvm_stat vm monitor script kvm:vmx: more complete state update on APICv on/off KVM: SVM: Add more SVM_EXIT_REASONS KVM: Unify traced vector format svm: bitwise vs logical op typo KVM: arm/arm64: vgic-new: Synchronize changes to active state KVM: arm/arm64: vgic-new: enable build KVM: arm/arm64: vgic-new: implement mapped IRQ handling KVM: arm/arm64: vgic-new: Wire up irqfd injection KVM: arm/arm64: vgic-new: Add vgic_v2/v3_enable KVM: arm/arm64: vgic-new: vgic_init: implement map_resources KVM: arm/arm64: vgic-new: vgic_init: implement vgic_init KVM: arm/arm64: vgic-new: vgic_init: implement vgic_create KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init ...
| * | | | | | KVM: Create debugfs dir and stat files for each VMJanosch Frank2016-05-251-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a kvm debugfs subdirectory for each VM, which is named after its pid and file descriptor. The directories contain the same kind of files that are already in the kvm debugfs directory, but the data exported through them is now VM specific. This makes the debugfs kvm data a convenient alternative to the tracepoints which already have per VM data. The debugfs data is easy to read and low overhead. CC: Dan Carpenter <dan.carpenter@oracle.com> [includes fixes by Dan Carpenter] Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | Merge tag 'kvm-arm-for-4-7-take2' of ↵Paolo Bonzini2016-05-242-0/+8
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next KVM/ARM Changes for v4.7 take 2 "The GIC is dead; Long live the GIC" This set of changes include the new vgic, which is a reimplementation of our horribly broken legacy vgic implementation. The two implementations will live side-by-side (with the new being the configured default) for one kernel release and then we'll remove it. Also fixes a non-critical issue with virtual abort injection to guests.
| | * | | | | | KVM: arm/arm64: vgic-new: Add CTLR, TYPER and IIDR handlersMarc Zyngier2016-05-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Those three registers are v2 emulation specific, so their implementation lives entirely in vgic-mmio-v2.c. Also they are handled in one function, as their implementation is pretty simple. When the guest enables the distributor, we kick all VCPUs to get potentially pending interrupts serviced. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | | | | KVM: arm/arm64: vgic-new: Add GICv3 world switch backendMarc Zyngier2016-05-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the GICv3 virtual interface registers differ from their GICv2 siblings, we need different handlers for processing maintenance interrupts and reading/writing to the LRs. Implement the respective handler functions and connect them to existing code to be called if the host is using a GICv3. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | | | | KVM: arm/arm64: vgic-new: Add GICv2 world switch backendMarc Zyngier2016-05-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Processing maintenance interrupts and accessing the list registers are dependent on the host's GIC version. Introduce vgic-v2.c to contain GICv2 specific functions. Implement the GICv2 specific code for syncing the emulation state into the VGIC registers. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | | | | KVM: arm/arm64: move GICv2 emulation defines into arm-gic-v3.hAndre Przywara2016-05-201-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As (some) GICv3 hosts can emulate a GICv2, some GICv2 specific masks for the list register definition also apply to GICv3 LRs. At the moment we have those definitions in the KVM VGICv3 implementation, so let's move them into the GICv3 header file to have them automatically defined. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
* | | | | | | | Merge branch 'kbuild' of ↵Linus Torvalds2016-05-261-2/+31
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and unexports symbols which are not used in the current config [Nicolas Pitre] - several kbuild rule cleanups [Masahiro Yamada] - warning option adjustments for gcov etc [Arnd Bergmann] - a few more small fixes * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits) kbuild: move -Wunused-const-variable to W=1 warning level kbuild: fix if_change and friends to consider argument order kbuild: fix adjust_autoksyms.sh for modules that need only one symbol kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line gcov: disable -Wmaybe-uninitialized warning gcov: disable tree-loop-im to reduce stack usage gcov: disable for COMPILE_TEST Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES Kbuild: change CC_OPTIMIZE_FOR_SIZE definition kbuild: forbid kernel directory to contain spaces and colons kbuild: adjust ksym_dep_filter for some cmd_* renames kbuild: Fix dependencies for final vmlinux link kbuild: better abstract vmlinux sequential prerequisites kbuild: fix call to adjust_autoksyms.sh when output directory specified kbuild: Get rid of KBUILD_STR kbuild: rename cmd_as_s_S to cmd_cpp_s_S kbuild: rename cmd_cc_i_c to cmd_cpp_i_c kbuild: drop redundant "PHONY += FORCE" kbuild: delete unnecessary "@:" kbuild: mark help target as PHONY ...
| * | | | | | | | kbuild: add fine grained build dependencies for exported symbolsNicolas Pitre2016-03-291-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Like with kconfig options, we now have the ability to compile in and out individual EXPORT_SYMBOL() declarations based on the content of include/generated/autoksyms.h. However we don't want the entire world to be rebuilt whenever that file is touched. Let's apply the same build dependency trick used for CONFIG_* symbols where the time stamp of empty files whose paths matching those symbols is used to trigger fine grained rebuilds. In our case the key is the symbol name passed to EXPORT_SYMBOL(). However, unlike config options, we cannot just use fixdep to parse the source code for EXPORT_SYMBOL(ksym) because several variants exist and parsing them all in a separate tool, and keeping it in synch, is not trivially maintainable. Furthermore, there are variants such as EXPORT_SYMBOL_GPL(pci_user_read_config_##size); that are instanciated via a macro for which we can't easily determine the actual exported symbol name(s) short of actually running the preprocessor on them. Storing the symbol name string in a special ELF section doesn't work for targets that output assembly or preprocessed source. So the best way is really to leverage the preprocessor by having it output actual symbol names anchored by a special sequence that can be easily filtered out. Then the list of symbols is simply fed to fixdep to be merged with the other dependencies. That implies the preprocessor is executed twice for each source file. A previous attempt relied on a warning pragma for each EXPORT_SYMBOL() instance that was filtered apart from stderr by the build system with a sed script during the actual compilation pass. Unfortunately the preprocessor/compiler diagnostic output isn't stable between versions and this solution, although more efficient, was deemed too fragile. Because of the lowercasing performed by fixdep, there might be name collisions triggering spurious rebuilds for similar symbols. But this shouldn't be a big issue in practice. (This is the case for CONFIG_* symbols and I didn't want to be different here, whatever the original reason for doing so.) To avoid needless build overhead, the exported symbol name gathering is performed only when CONFIG_TRIM_UNUSED_KSYMS is selected. Signed-off-by: Nicolas Pitre <nico@linaro.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | export.h: allow for per-symbol configurable EXPORT_SYMBOL()Nicolas Pitre2016-03-291-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to include/generated/autoconf.h, include/generated/autoksyms.h will contain a list of defines for each EXPORT_SYMBOL() that we want active. The format is: #define __KSYM_<symbol_name> 1 This list will be auto-generated with another patch. For now we only include the preprocessor magic to automatically create or omit the corresponding struct kernel_symbol declaration. Given the content of include/generated/autoksyms.h may not be known in advance, an empty file is created early on to let the build proceed. Signed-off-by: Nicolas Pitre <nico@linaro.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
* | | | | | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2016-05-264-19/+7
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: drivers/pinctrl/intel/pinctrl-baytrail.c: fix build with gcc-4.4 update "mm/zsmalloc: don't fail if can't create debugfs info" dma-debug: avoid spinlock recursion when disabling dma-debug mm: oom_reaper: remove some bloat memcg: fix mem_cgroup_out_of_memory() return value. ocfs2: fix improper handling of return errno mm: slub: remove unused virt_to_obj() mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta mm: make CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on !FLATMEM explicitly seqlock: fix raw_read_seqcount_latch()
| * | | | | | | | | mm: oom_reaper: remove some bloatMichal Hocko2016-05-262-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mmput_async is currently used only from the oom_reaper which is defined only for CONFIG_MMU. We can save work_struct in mm_struct for !CONFIG_MMU. [akpm@linux-foundation.org: fix typo, per Minchan] Link: http://lkml.kernel.org/r/20160520061658.GB19172@dhcp22.suse.cz Reported-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | mm: slub: remove unused virt_to_obj()Andrey Ryabinin2016-05-261-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's unused since commit 7ed2f9e66385 ("mm, kasan: SLAB support") Link: http://lkml.kernel.org/r/1464020961-2242-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | seqlock: fix raw_read_seqcount_latch()Alexey Dobriyan2016-05-261-2/+2
| | |_|_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockless_dereference() is supposed to take pointer not integer. Link: http://lkml.kernel.org/r/20160521201448.GA7429@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | Merge tag 'dax-locking-for-4.7' of ↵Linus Torvalds2016-05-262-1/+22
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull DAX locking updates from Ross Zwisler: "Filesystem DAX locking for 4.7 - We use a bit in an exceptional radix tree entry as a lock bit and use it similarly to how page lock is used for normal faults. This fixes races between hole instantiation and read faults of the same index. - Filesystem DAX PMD faults are disabled, and will be re-enabled when PMD locking is implemented" * tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Remove i_mmap_lock protection dax: Use radix tree entry lock to protect cow faults dax: New fault locking dax: Allow DAX code to replace exceptional entries dax: Define DAX lock bit for radix tree exceptional entry dax: Make huge page handling depend of CONFIG_BROKEN dax: Fix condition for filling of PMD holes
| * | | | | | | | | dax: Use radix tree entry lock to protect cow faultsJan Kara2016-05-192-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing cow faults, we cannot directly fill in PTE as we do for other faults as we rely on generic code to do proper accounting of the cowed page. We also have no page to lock to protect against races with truncate as other faults have and we need the protection to extend until the moment generic code inserts cowed page into PTE thus at that point we have no protection of fs-specific i_mmap_sem. So far we relied on using i_mmap_lock for the protection however that is completely special to cow faults. To make fault locking more uniform use DAX entry lock instead. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
| * | | | | | | | | dax: New fault lockingJan Kara2016-05-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently DAX page fault locking is racy. CPU0 (write fault) CPU1 (read fault) __dax_fault() __dax_fault() get_block(inode, block, &bh, 0) -> not mapped get_block(inode, block, &bh, 0) -> not mapped if (!buffer_mapped(&bh)) if (vmf->flags & FAULT_FLAG_WRITE) get_block(inode, block, &bh, 1) -> allocates blocks if (page) -> no if (!buffer_mapped(&bh)) if (vmf->flags & FAULT_FLAG_WRITE) { } else { dax_load_hole(); } dax_insert_mapping() And we are in a situation where we fail in dax_radix_entry() with -EIO. Another problem with the current DAX page fault locking is that there is no race-free way to clear dirty tag in the radix tree. We can always end up with clean radix tree and dirty data in CPU cache. We fix the first problem by introducing locking of exceptional radix tree entries in DAX mappings acting very similarly to page lock and thus synchronizing properly faults against the same mapping index. The same lock can later be used to avoid races when clearing radix tree dirty tag. Reviewed-by: NeilBrown <neilb@suse.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
| * | | | | | | | | dax: Allow DAX code to replace exceptional entriesJan Kara2016-05-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we forbid page_cache_tree_insert() to replace exceptional radix tree entries for DAX inodes. However to make DAX faults race free we will lock radix tree entries and when hole is created, we need to replace such locked radix tree entry with a hole page. So modify page_cache_tree_insert() to allow that. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
| * | | | | | | | | dax: Define DAX lock bit for radix tree exceptional entryJan Kara2016-05-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will use lowest available bit in the radix tree exceptional entry for locking of the entry. Define it. Also clean up definitions of DAX entry type bits in DAX exceptional entries to use defined constants instead of hardcoding numbers and cleanup checking of these bits to not rely on how other bits in the entry are set. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
| * | | | | | | | | dax: Make huge page handling depend of CONFIG_BROKENJan Kara2016-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the handling of huge pages for DAX is racy. For example the following can happen: CPU0 (THP write fault) CPU1 (normal read fault) __dax_pmd_fault() __dax_fault() get_block(inode, block, &bh, 0) -> not mapped get_block(inode, block, &bh, 0) -> not mapped if (!buffer_mapped(&bh) && write) get_block(inode, block, &bh, 1) -> allocates blocks truncate_pagecache_range(inode, lstart, lend); dax_load_hole(); This results in data corruption since process on CPU1 won't see changes into the file done by CPU0. The race can happen even if two normal faults race however with THP the situation is even worse because the two faults don't operate on the same entries in the radix tree and we want to use these entries for serialization. So make THP support in DAX code depend on CONFIG_BROKEN for now. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
* | | | | | | | | | Merge tag 'dax-misc-for-4.7' of ↵Linus Torvalds2016-05-263-13/+28
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull misc DAX updates from Vishal Verma: "DAX error handling for 4.7 - Until now, dax has been disabled if media errors were found on any device. This enables the use of DAX in the presence of these errors by making all sector-aligned zeroing go through the driver. - The driver (already) has the ability to clear errors on writes that are sent through the block layer using 'DSMs' defined in ACPI 6.1. Other misc changes: - When mounting DAX filesystems, check to make sure the partition is page aligned. This is a requirement for DAX, and previously, we allowed such unaligned mounts to succeed, but subsequent reads/writes would fail. - Misc/cleanup fixes from Jan that remove unused code from DAX related to zeroing, writeback, and some size checks" * tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: fix a comment in dax_zero_page_range and dax_truncate_page dax: for truncate/hole-punch, do zeroing through the driver if possible dax: export a low-level __dax_zero_page_range helper dax: use sb_issue_zerout instead of calling dax_clear_sectors dax: enable dax in the presence of known media errors (badblocks) dax: fallback from pmd to pte on error block: Update blkdev_dax_capable() for consistency xfs: Add alignment check for DAX mount ext2: Add alignment check for DAX mount ext4: Add alignment check for DAX mount block: Add bdev_dax_supported() for dax mount checks block: Add vfs_msg() interface dax: Remove redundant inode size checks dax: Remove pointless writeback from dax_do_io() dax: Remove zeroing from dax_io() dax: Remove dead zeroing code from fault handlers ext2: Avoid DAX zeroing to corrupt data ext2: Fix block zeroing in ext2_get_blocks() for DAX dax: Remove complete_unwritten argument DAX: move RADIX_DAX_ definitions to dax.c
| * | | | | | | | | dax: export a low-level __dax_zero_page_range helperChristoph Hellwig2016-05-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows XFS to perform zeroing using the iomap infrastructure and avoid buffer heads. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christoph Hellwig <hch@lst.de> [vishal: fix conflicts with dax-error-handling] Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
| * | | | | | | | | dax: use sb_issue_zerout instead of calling dax_clear_sectorsMatthew Wilcox2016-05-181-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dax_clear_sectors() cannot handle poisoned blocks. These must be zeroed using the BIO interface instead. Convert ext2 and XFS to use only sb_issue_zerout(). Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> [vishal: Also remove the dax_clear_sectors function entirely] Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
| * | | | | | | | | dax: enable dax in the presence of known media errors (badblocks)Dan Williams2016-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1/ If a mapping overlaps a bad sector fail the request. 2/ Do not opportunistically report more dax-capable capacity than is requested when errors present. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> [vishal: fix a conflict with system RAM collision patches] [vishal: add a 'size' parameter to ->direct_access] [vishal: fix a conflict with DAX alignment check patches] Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
| * | | | | | | | | block: Update blkdev_dax_capable() for consistencyToshi Kani2016-05-172-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blkdev_dax_capable() is similar to bdev_dax_supported(), but needs to remain as a separate interface for checking dax capability of a raw block device. Rename and relocate blkdev_dax_capable() to keep them maintained consistently, and call bdev_direct_access() for the dax capability check. There is no change in the behavior. Link: https://lkml.org/lkml/2016/5/9/950 Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@fb.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
| * | | | | | | | | block: Add bdev_dax_supported() for dax mount checksToshi Kani2016-05-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DAX imposes additional requirements to a device. Add bdev_dax_supported() which performs all the precondition checks necessary for filesystem to mount the device with dax option. Also add a new check to verify if a partition is aligned by 4KB. When a partition is unaligned, any dax read/write access fails, except for metadata update. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@fb.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>