summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'libnvdimm-for-4.7' of ↵Linus Torvalds2016-05-2342-787/+2252
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The bulk of this update was stabilized before the merge window and appeared in -next. The "device dax" implementation was revised this week in response to review feedback, and to address failures detected by the recently expanded ndctl unit test suite. Not included in this pull request are two dax topic branches (dax error handling, and dax radix-tree locking). These topics were deferred to get a few more days of -next integration testing, and to coordinate a branch baseline with Ted and the ext4 tree. Vishal and Ross will send the error handling and locking topics respectively in the next few days. This branch has received a positive build result from the kbuild robot across 226 configs. Summary: - Device DAX for persistent memory: Device DAX is the device-centric analogue of Filesystem DAX (CONFIG_FS_DAX). It allows memory ranges to be allocated and mapped without need of an intervening file system. Device DAX is strict, precise and predictable. Specifically this interface: a) Guarantees fault granularity with respect to a given page size (pte, pmd, or pud) set at configuration time. b) Enforces deterministic behavior by being strict about what fault scenarios are supported. Persistent memory is the first target, but the mechanism is also targeted for exclusive allocations of performance/feature differentiated memory ranges. - Support for the HPE DSM (device specific method) command formats. This enables management of these first generation devices until a unified DSM specification materializes. - Further ACPI 6.1 compliance with support for the common dimm identifier format. - Various fixes and cleanups across the subsystem" * tag 'libnvdimm-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (40 commits) libnvdimm, dax: fix deletion libnvdimm, dax: fix alignment validation libnvdimm, dax: autodetect support libnvdimm: release ida resources Revert "block: enable dax for raw block devices" /dev/dax, core: file operations and dax-mmap /dev/dax, pmem: direct access to persistent memory libnvdimm: stop requiring a driver ->remove() method libnvdimm, dax: record the specified alignment of a dax-device instance libnvdimm, dax: reserve space to store labels for device-dax libnvdimm, dax: introduce device-dax infrastructure nfit: add sysfs dimm 'family' and 'dsm_mask' attributes tools/testing/nvdimm: ND_CMD_CALL support nfit: disable vendor specific commands nfit: export subsystem ids as attributes nfit: fix format interface code byte order per ACPI6.1 nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism nfit, libnvdimm: clarify "commands" vs "_DSMs" libnvdimm: increase max envelope size for ioctl acpi/nfit: Add sysfs "id" for NVDIMM ID ...
| * Merge branch 'for-4.7/dax' into libnvdimm-for-nextDan Williams2016-05-2126-143/+935
| |\
| | * libnvdimm, dax: fix deletionDan Williams2016-05-213-21/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ndctl unit tests discovered that the dax enabling omitted updates to nd_detach_and_reset(). This routine clears device the configuration when the namespace is detached. Without this clearing userspace may assume that the device is in the process of being configured by another agent in the system. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * libnvdimm, dax: fix alignment validationDan Williams2016-05-211-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing the dax-device autodetect support revealed a probe failure with the following result: dax0.1: bad offset: 0x8200000 dax disabled The original pfn-device implementation inferred the alignment from ilog2(offset), now that the alignment is explicit the is_power_of_2() needs replacing with a real sanity check against the recorded alignment. Otherwise the alignment check is useless in the implicit case and only the minimum size of the offset matters. This self-consistency check is further validated by the probe path that will re-check that the offset is large enough to contain all the metadata required to enable the device. Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * libnvdimm, dax: autodetect supportDan Williams2016-05-205-8/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | For autodetecting a previously established dax configuration we need the info block to indicate block-device vs device-dax mode, and we need to have the default namespace probe hand-off the configuration to the dax_pmem driver. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * libnvdimm: release ida resourcesDan Williams2016-05-204-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ida instances allocate some internal memory for ->free_bitmap in addition to the base 'struct ida'. Use ida_destroy() to release that memory at module_exit(). Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * Revert "block: enable dax for raw block devices"Dan Williams2016-05-204-108/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 5a023cdba50c5f5f2bc351783b3131699deb3937. The functionality is superseded by the new "Device DAX" facility. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jan Kara <jack@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * /dev/dax, core: file operations and dax-mmapDan Williams2016-05-204-0/+325
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "Device DAX" core enables dax mappings of performance / feature differentiated memory. An open mapping or file handle keeps the backing struct device live, but new mappings are only possible while the device is enabled. Faults are handled under rcu_read_lock to synchronize with the enabled state of the device. Similar to the filesystem-dax case the backing memory may optionally have struct page entries. However, unlike fs-dax there is no support for private mappings, or mappings that are not backed by media (see use of zero-page in fs-dax). Mappings are always guaranteed to match the alignment of the dax_region. If the dax_region is configured to have a 2MB alignment, all mappings are guaranteed to be backed by a pmd entry. Contrast this determinism with the fs-dax case where pmd mappings are opportunistic. If userspace attempts to force a misaligned mapping, the driver will fail the mmap attempt. See dax_dev_check_vma() for other scenarios that are rejected, like MAP_PRIVATE mappings. Cc: Hannes Reinecke <hare@suse.de> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * /dev/dax, pmem: direct access to persistent memoryDan Williams2016-05-209-0/+478
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Device DAX is the device-centric analogue of Filesystem DAX (CONFIG_FS_DAX). It allows memory ranges to be allocated and mapped without need of an intervening file system. Device DAX is strict, precise and predictable. Specifically this interface: 1/ Guarantees fault granularity with respect to a given page size (pte, pmd, or pud) set at configuration time. 2/ Enforces deterministic behavior by being strict about what fault scenarios are supported. For example, by forcing MADV_DONTFORK semantics and omitting MAP_PRIVATE support device-dax guarantees that a mapping always behaves/performs the same once established. It is the "what you see is what you get" access mechanism to differentiated memory vs filesystem DAX which has filesystem specific implementation semantics. Persistent memory is the first target, but the mechanism is also targeted for exclusive allocations of performance differentiated memory ranges. This commit is limited to the base device driver infrastructure to associate a dax device with pmem range. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * libnvdimm: stop requiring a driver ->remove() methodDan Williams2016-05-181-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dax_pmem driver was implementing an empty ->remove() method to satisfy the nvdimm bus driver that unconditionally calls ->remove(). Teach the core bus driver to check if ->remove() is NULL to remove that requirement. Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * | Merge branch 'for-4.7/acpi6.1' into libnvdimm-for-nextDan Williams2016-05-1861-725/+1038
| |\ \
| | * | nfit: export subsystem ids as attributesDan Williams2016-04-291-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to pci-sysfs export the subsystem information available in the NFIT. ACPI 6.1 clarifies that this data is copied as an array of bytes from the DIMM SPD data. Reported-by: Ryon Jensen <ryon.jensen@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | nfit: fix format interface code byte order per ACPI6.1Dan Williams2016-04-281-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ACPI6.1 clarifies that DCR fields are stored as an array of bytes, update the format interface code constants to match. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | acpi/nfit: Add sysfs "id" for NVDIMM IDToshi Kani2016-04-261-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ACPI 6.1, section 5.2.25.9, defines an identifier for an NVDIMM. Change the NFIT driver to add a new sysfs file "id" under nfit directory. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Robert Moore <robert.moore@intel.com> Cc: Robert Elliott <elliott@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | acpi/nfit: Update nfit driver to comply with ACPI 6.1Toshi Kani2016-04-261-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ACPI 6.1, Table 5-133, updates NVDIMM Control Region Structure as follows. - Valid Fields, Manufacturing Location, and Manufacturing Date are added from reserved range. No change in the structure size. - IDs (SPD values) are stored as arrays of bytes (i.e. big-endian format). The spec clarifies that they need to be represented as arrays of bytes as well. This patch makes the following changes to support this update. - Change the NFIT driver to show SPD ID values in big-endian format. - Change sprintf format to use "0x" instead of "#" since "%#02x" does not prepend '0'. link: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Robert Moore <robert.moore@intel.com> Cc: Robert Elliott <elliott@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * | | Merge branch 'for-4.7/dsm' into libnvdimm-for-nextDan Williams2016-05-1811-58/+283
| |\ \ \
| | * | | nfit: add sysfs dimm 'family' and 'dsm_mask' attributesDan Williams2016-05-051-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Communicate the command format and supported functions to userspace tooling. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | tools/testing/nvdimm: ND_CMD_CALL supportDan Williams2016-05-051-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable nfit_test to use nd_cmd_pkg marshaling. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | nfit: disable vendor specific commandsDan Williams2016-05-051-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Module option to limit userspace to the publicly defined command set. For cases where private DIMM commands may be interfering with the kernel's handling of DIMM state this option can be set to block vendor specific commands. Cc: Jerry Hoemann <jerry.hoemann@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanismDan Williams2016-04-284-17/+179
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are currently 4 known similar but incompatible definitions of the command sets that can be sent to an NVDIMM through ACPI. It is also clear that future platform generations (ACPI or not) will continue to revise and extend the DIMM command set as new devices and use cases arrive. It is obviously untenable to continue to proliferate divergence of these command definitions, and to that end a standardization process has begun to provide for a unified specification. However, that leaves a problem about what to do with this first generation where vendors are already shipping divergence. The Linux kernel can support these initial diverged platforms without giving platform-firmware free reign to continue to diverge and compound kernel maintenance overhead. The kernel implementation can encourage standardization in two ways: 1/ Require that any function code that userspace wants to send be explicitly white-listed in the implementation. For ACPI this means function codes marked as supported by acpi_check_dsm() may only be invoked if they appear in the white-list. A function must be publicly documented before it is added to the white-list. 2/ The above restrictions can be trivially bypassed by using the "vendor-specific" payload command. However, since vendor-specific commands are by definition not publicly documented and have the potential to corrupt the kernel's view of the dimm state, we provide a toggle to disable vendor-specific operations. Enabling undefined behavior is a policy decision that can be made by the platform owner and encourages firmware implementations to choose public over private command implementations. Based on an initial patch from Jerry Hoemann Cc: Jerry Hoemann <jerry.hoemann@hpe.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | nfit, libnvdimm: clarify "commands" vs "_DSMs"Dan Williams2016-04-288-37/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clarify the distinction between "commands", the ioctls userspace calls to request the kernel take some action on a given dimm device, and "_DSMs", the actual function numbers used in the firmware interface to the DIMM. _DSMs are ACPI specific whereas commands are Linux kernel generic. This is in preparation for breaking the 1:1 implicit relationship between the kernel ioctl number space and the firmware specific function numbers. Cc: Jerry Hoemann <jerry.hoemann@hpe.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm: increase max envelope size for ioctlJerry Hoemann2016-04-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nd_ioctl() must first read in the fixed sized portion of an ioctl so that it can then determine the size of the variable part. Prepare for ND_CMD_CALL calls which have larger fixed portion envelope. Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | acpi: widen acpi_evaluate_dsm() revision and function-index argumentsJerry Hoemann2016-04-112-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ACPI specification states that arguments "Revision ID" and "Function Index" to a _DSM are type "Integer." Type Integers are 64 bit quantities. The function evaluate_dsm specifies these types as simple "int" which are 32 bits. Widen type passed to acpi_evaluate_dsm and its callers and derived callers to pass correct type. acpi_check_dsm and acpi_evaluate_dsm_typed had similar issue and were corrected as well. This is in preparation for libnvdimm implementing a generic _DSM passthrough facility to have the capacity to pass 64-bit values as the ACPI specification allows. [djbw: clarify the changelog, add rationale] Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * | | | Merge branch 'for-4.7/libnvdimm' into libnvdimm-for-nextDan Williams2016-05-186-6/+158
| |\ \ \ \
| | * | | | libnvdimm, btt: add btt startup debugDan Williams2016-04-221-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Report the reason for btt probe failures when debug is enabled. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | | libnvdimm, nfit: Use ACPI_SIG_NFIT instead of hard coded stringLee, Chun-Yi2016-04-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's minor but that's still better to use ACPI_SIG_NFIT instead of hard coded string. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | | libnvdimm, test: add mock SMART data payloadDan Williams2016-04-113-1/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide simulated SMART data to enable the ndctl implementation of SMART data retrieval and parsing. The payload is defined here, "Section 4.1 SMART and Health Info (Function Index 1)": http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | | libnvdimm, nfit: report multiple interface codes per-dimmDan Williams2016-04-112-3/+70
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting with ACPI 6.1 an NFIT table will report multiple 'NVDIMM Control Region Structure' instances per-dimm, one for each supported format interface. Report that code in the following format in sysfs: nmemX/nfit/formats nmemX/nfit/format nmemX/nfit/format1 nmemX/nfit/format2 ... nmemX/nfit/formatN Where format2 - formatN are theoretical as there are no known DIMMs with support for more than two interface formats. This layout is compatible with existing libndctl binaries that only expect one code per-dimm as they will ignore nmemX/nfit/formats and nmemX/nfit/formatN. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * | | | Merge branch 'for-4.7/dax' into libnvdimm-for-nextDan Williams2016-05-1820-594/+837
| |\ \ \ \ | | | |_|/ | | |/| |
| | * | | libnvdimm, dax: record the specified alignment of a dax-device instanceDan Williams2016-05-092-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to use the alignment as the allocation and mapping unit. Previously this information was only useful for establishing the data offset, but now it is important to remember the granularity for the later use. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, dax: reserve space to store labels for device-daxDan Williams2016-05-091-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We may want to subdivide a device-dax range into multiple devices so that each can have separate permissions or naming. Reserve 128K of label space by default so we have the capability of making allocation decisions persistent. This reservation is not something we can add later since it would result in the default size of a device-dax range changing between kernel versions. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, dax: introduce device-dax infrastructureDan Williams2016-05-0913-34/+264
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Device DAX is the device-centric analogue of Filesystem DAX (CONFIG_FS_DAX). It allows persistent memory ranges to be allocated and mapped without need of an intervening file system. This initial infrastructure arranges for a libnvdimm pfn-device to be represented as a different device-type so that it can be attached to a driver other than the pmem driver. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm: cleanup nvdimm_namespace_common_probe(), kill 'host'Dan Williams2016-04-221-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'host' variable can be killed as it is always the same as the passed in device. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, pmem: kill ->pmem_queue and ->pmem_diskDan Williams2016-04-221-13/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The devm conversion obviates the need to continue to remember the queue and disk locally in the driver. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, pmem, pfn: move pfn setup to the coreDan Williams2016-04-223-184/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that pmem internals have been disentangled from pfn setup, that code can move to the core. This is in preparation for adding another user of the pfn-device capabilities. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, pmem, pfn: make pmem_rw_bytes generic and refactor pfn setupDan Williams2016-04-229-173/+211
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for providing an alternative (to block device) access mechanism to persistent memory, convert pmem_rw_bytes() to nsio_rw_bytes(). This allows ->rw_bytes() functionality without requiring a 'struct pmem_device' to be instantiated. In other words, when ->rw_bytes() is in use i/o is driven through 'struct nd_namespace_io', otherwise it is driven through 'struct pmem_device' and the block layer. This consolidates the disjoint calls to devm_exit_badblocks() and devm_memunmap() into a common devm_nsio_disable() and cleans up the init path to use a unified pmem_attach_disk() implementation. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, pmem: clean up resource print / requestDan Williams2016-04-221-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The leading '0x' in front of %pa is redundant, also we can just use %pR to simplify the print statement. The request parameters can be directly taken from the resource as well. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, pmem: use devm_add_action to release bdev resourcesDan Williams2016-04-221-49/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Register a callback to clean up the request_queue and put the gendisk at driver disable time. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, blk: move i/o infrastructure to nd_namespace_blkDan Williams2016-04-222-68/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consolidate the information for issuing i/o to a blk-namespace, and eliminate some pointer chasing. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, blk: quiet i/o error reportingDan Williams2016-04-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I/O errors events have the potential to be a high frequency and a log message for each event can swamp the system. This message is also redundant with upper layer error reporting. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, pmem: use ->queuedata for driver private dataDan Williams2016-04-221-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Save a pointer chase by storing the driver private data in the request_queue rather than the gendisk. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, blk: use ->queuedata for driver private dataDan Williams2016-04-221-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Save a pointer chase by storing the driver private data in the request_queue rather than the gendisk. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, blk: use devm_add_action to release bdev resourcesDan Williams2016-04-221-41/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Register a callback to clean up the request_queue and put the gendisk at driver disable time. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, btt, convert nd_btt_probe() to devmDan Williams2016-04-225-28/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the device performing the probe so we can use a devm allocation for the btt superblock. Cc: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, pfn, convert nd_pfn_probe() to devmDan Williams2016-04-223-37/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the device performing the probe so we can use a devm allocation for the pfn superblock. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| | * | | libnvdimm, pmem: kill pmem->ndnsDan Williams2016-04-224-22/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can derive the common namespace from other information. We also do not need to cache it because all the usages are in slow paths. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | | | Merge tag 'trace-v4.7-2' of ↵Linus Torvalds2016-05-223-4/+147
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull motr tracing updates from Steven Rostedt: "Three more changes. - I forgot that I had another selftest to stress test the ftrace instance creation. It was actually suppose to go into the 4.6 merge window, but I never committed it. I almost forgot about it again, but noticed it was missing from your tree. - Soumya PN sent me a clean up patch to not disable interrupts when taking the tasklist_lock for read, as it's unnecessary because that lock is never taken for write in irq context. - Newer gcc's can cause the jump in the function_graph code to the global ftrace_stub label to be a short jump instead of a long one. As that jump is dynamically converted to jump to the trace code to do function graph tracing, and that conversion expects a long jump it can corrupt the ftrace_stub itself (it's directly after that call). One way to prevent gcc from using a short jump is to declare the ftrace_stub as a weak function, which we do here to keep gcc from optimizing too much" * tag 'trace-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it ftrace: Don't disable irqs when taking the tasklist_lock read_lock ftracetest: Add instance created, delete, read and enable event test
| * | | | | ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to itSteven Rostedt2016-05-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Matt Fleming reported seeing crashes when enabling and disabling function profiling which uses function graph tracer. Later Namhyung Kim hit a similar issue and he found that the issue was due to the jmp to ftrace_stub in ftrace_graph_call was only two bytes, and when it was changed to jump to the tracing code, it overwrote the ftrace_stub that was after it. Masami Hiramatsu bisected this down to a binutils change: 8dcea93252a9ea7dff57e85220a719e2a5e8ab41 is the first bad commit commit 8dcea93252a9ea7dff57e85220a719e2a5e8ab41 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri May 15 03:17:31 2015 -0700 Add -mshared option to x86 ELF assembler This patch adds -mshared option to x86 ELF assembler. By default, assembler will optimize out non-PLT relocations against defined non-weak global branch targets with default visibility. The -mshared option tells the assembler to generate code which may go into a shared library where all non-weak global branch targets with default visibility can be preempted. The resulting code is slightly bigger. This option only affects the handling of branch instructions. Declaring ftrace_stub as a weak call prevents gas from using two byte jumps to it, which would be converted to a jump to the function graph code. Link: http://lkml.kernel.org/r/20160516230035.1dbae571@gandalf.local.home Reported-by: Matt Fleming <matt@codeblueprint.co.uk> Reported-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | ftrace: Don't disable irqs when taking the tasklist_lock read_lockSoumya PN2016-05-201-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ftrace.c inside the function alloc_retstack_tasklist() (which will be invoked when function_graph tracing is on) the tasklist_lock is being held as reader while iterating through a list of threads. Here the lock is being held as reader with irqs disabled. The tasklist_lock is never write_locked in interrupt context so it is safe to not disable interrupts for the duration of read_lock in this block which, can be significant, given the block of code iterates through all threads. Hence changing the code to call read_lock() and read_unlock() instead of read_lock_irqsave() and read_unlock_irqrestore(). A similar change was made in commits: 8063e41d2ffc ("tracing: Change syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread()")' and 3472eaa1f12e ("sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()")' Link: http://lkml.kernel.org/r/1463500874-77480-1-git-send-email-soumya.p.n@hpe.com Signed-off-by: Soumya PN <soumya.p.n@hpe.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | ftracetest: Add instance created, delete, read and enable event testSteven Rostedt (Red Hat)2016-05-091-0/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new ftrace test that creates three threads. One that creates and removes an ftrace instance, one that reads the instance, and one that enables and disables events in the instance. This is a stress test for accessing and removing instances at the same time. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>