summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/stable/firewire-cdev103
-rw-r--r--Documentation/ABI/stable/sysfs-bus-firewire122
-rw-r--r--Documentation/ABI/stable/vdso27
-rw-r--r--Documentation/ABI/testing/sysfs-driver-hid-roccat-koneplus8
-rw-r--r--Documentation/ABI/testing/sysfs-driver-hid-wiimote10
-rw-r--r--Documentation/DocBook/80211.tmpl5
-rw-r--r--Documentation/DocBook/kernel-hacking.tmpl2
-rw-r--r--Documentation/SubmitChecklist2
-rw-r--r--Documentation/development-process/4.Coding2
-rw-r--r--Documentation/devicetree/bindings/arm/primecell.txt21
-rw-r--r--Documentation/devicetree/bindings/crypto/fsl-sec2.txt (renamed from Documentation/devicetree/bindings/powerpc/fsl/sec.txt)2
-rw-r--r--Documentation/devicetree/bindings/gpio/fsl-imx-gpio.txt22
-rw-r--r--Documentation/devicetree/bindings/gpio/gpio.txt46
-rw-r--r--Documentation/devicetree/bindings/gpio/gpio_nvidia.txt8
-rw-r--r--Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt22
-rw-r--r--Documentation/devicetree/bindings/spi/spi_nvidia.txt5
-rw-r--r--Documentation/devicetree/bindings/tty/serial/of-serial.txt36
-rw-r--r--Documentation/feature-removal-schedule.txt10
-rw-r--r--Documentation/filesystems/Locking8
-rw-r--r--Documentation/filesystems/porting27
-rw-r--r--Documentation/filesystems/ubifs.txt28
-rw-r--r--Documentation/filesystems/vfs.txt30
-rw-r--r--Documentation/ja_JP/SubmitChecklist2
-rw-r--r--Documentation/mmc/00-INDEX2
-rw-r--r--Documentation/mmc/mmc-async-req.txt87
-rw-r--r--Documentation/networking/ifenslave.c18
-rw-r--r--Documentation/networking/ip-sysctl.txt29
-rw-r--r--Documentation/networking/netdev-features.txt154
-rw-r--r--Documentation/networking/nfc.txt128
-rw-r--r--Documentation/networking/stmmac.txt200
-rw-r--r--Documentation/power/devices.txt14
-rw-r--r--Documentation/power/opp.txt2
-rw-r--r--Documentation/power/runtime_pm.txt229
-rw-r--r--Documentation/spi/ep93xx_spi10
-rw-r--r--Documentation/spi/pxa2xx5
-rw-r--r--Documentation/trace/kprobetrace.txt9
-rw-r--r--Documentation/vDSO/parse_vdso.c256
-rw-r--r--Documentation/vDSO/vdso_test.c111
-rw-r--r--Documentation/virtual/lguest/lguest.c47
-rw-r--r--Documentation/x86/entry_64.txt98
-rw-r--r--Documentation/zh_CN/SubmitChecklist2
41 files changed, 1669 insertions, 280 deletions
diff --git a/Documentation/ABI/stable/firewire-cdev b/Documentation/ABI/stable/firewire-cdev
new file mode 100644
index 000000000000..16d030827368
--- /dev/null
+++ b/Documentation/ABI/stable/firewire-cdev
@@ -0,0 +1,103 @@
+What: /dev/fw[0-9]+
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: linux1394-devel@lists.sourceforge.net
+Description:
+ The character device files /dev/fw* are the interface between
+ firewire-core and IEEE 1394 device drivers implemented in
+ userspace. The ioctl(2)- and read(2)-based ABI is defined and
+ documented in <linux/firewire-cdev.h>.
+
+ This ABI offers most of the features which firewire-core also
+ exposes to kernelspace IEEE 1394 drivers.
+
+ Each /dev/fw* is associated with one IEEE 1394 node, which can
+ be remote or local nodes. Operations on a /dev/fw* file have
+ different scope:
+ - The 1394 node which is associated with the file:
+ - Asynchronous request transmission
+ - Get the Configuration ROM
+ - Query node ID
+ - Query maximum speed of the path between this node
+ and local node
+ - The 1394 bus (i.e. "card") to which the node is attached to:
+ - Isochronous stream transmission and reception
+ - Asynchronous stream transmission and reception
+ - Asynchronous broadcast request transmission
+ - PHY packet transmission and reception
+ - Allocate, reallocate, deallocate isochronous
+ resources (channels, bandwidth) at the bus's IRM
+ - Query node IDs of local node, root node, IRM, bus
+ manager
+ - Query cycle time
+ - Bus reset initiation, bus reset event reception
+ - All 1394 buses:
+ - Allocation of IEEE 1212 address ranges on the local
+ link layers, reception of inbound requests to such
+ an address range, asynchronous response transmission
+ to inbound requests
+ - Addition of descriptors or directories to the local
+ nodes' Configuration ROM
+
+ Due to the different scope of operations and in order to let
+ userland implement different access permission models, some
+ operations are restricted to /dev/fw* files that are associated
+ with a local node:
+ - Addition of descriptors or directories to the local
+ nodes' Configuration ROM
+ - PHY packet transmission and reception
+
+ A /dev/fw* file remains associated with one particular node
+ during its entire life time. Bus topology changes, and hence
+ node ID changes, are tracked by firewire-core. ABI users do not
+ need to be aware of topology.
+
+ The following file operations are supported:
+
+ open(2)
+ Currently the only useful flags are O_RDWR.
+
+ ioctl(2)
+ Initiate various actions. Some take immediate effect, others
+ are performed asynchronously while or after the ioctl returns.
+ See the inline documentation in <linux/firewire-cdev.h> for
+ descriptions of all ioctls.
+
+ poll(2), select(2), epoll_wait(2) etc.
+ Watch for events to become available to be read.
+
+ read(2)
+ Receive various events. There are solicited events like
+ outbound asynchronous transaction completion or isochronous
+ buffer completion, and unsolicited events such as bus resets,
+ request reception, or PHY packet reception. Always use a read
+ buffer which is large enough to receive the largest event that
+ could ever arrive. See <linux/firewire-cdev.h> for descriptions
+ of all event types and for which ioctls affect reception of
+ events.
+
+ mmap(2)
+ Allocate a DMA buffer for isochronous reception or transmission
+ and map it into the process address space. The arguments should
+ be used as follows: addr = NULL, length = the desired buffer
+ size, i.e. number of packets times size of largest packet,
+ prot = at least PROT_READ for reception and at least PROT_WRITE
+ for transmission, flags = MAP_SHARED, fd = the handle to the
+ /dev/fw*, offset = 0.
+
+ Isochronous reception works in packet-per-buffer fashion except
+ for multichannel reception which works in buffer-fill mode.
+
+ munmap(2)
+ Unmap the isochronous I/O buffer from the process address space.
+
+ close(2)
+ Besides stopping and freeing I/O contexts that were associated
+ with the file descriptor, back out any changes to the local
+ nodes' Configuration ROM. Deallocate isochronous channels and
+ bandwidth at the IRM that were marked for kernel-assisted
+ re- and deallocation.
+
+Users: libraw1394
+ libdc1394
+ tools like jujuutils, fwhack, ...
diff --git a/Documentation/ABI/stable/sysfs-bus-firewire b/Documentation/ABI/stable/sysfs-bus-firewire
new file mode 100644
index 000000000000..3d484e5dc846
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-bus-firewire
@@ -0,0 +1,122 @@
+What: /sys/bus/firewire/devices/fw[0-9]+/
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: linux1394-devel@lists.sourceforge.net
+Description:
+ IEEE 1394 node device attributes.
+ Read-only. Mutable during the node device's lifetime.
+ See IEEE 1212 for semantic definitions.
+
+ config_rom
+ Contents of the Configuration ROM register.
+ Binary attribute; an array of host-endian u32.
+
+ guid
+ The node's EUI-64 in the bus information block of
+ Configuration ROM.
+ Hexadecimal string representation of an u64.
+
+
+What: /sys/bus/firewire/devices/fw[0-9]+/units
+Date: June 2009
+KernelVersion: 2.6.31
+Contact: linux1394-devel@lists.sourceforge.net
+Description:
+ IEEE 1394 node device attribute.
+ Read-only. Mutable during the node device's lifetime.
+ See IEEE 1212 for semantic definitions.
+
+ units
+ Summary of all units present in an IEEE 1394 node.
+ Contains space-separated tuples of specifier_id and
+ version of each unit present in the node. Specifier_id
+ and version are hexadecimal string representations of
+ u24 of the respective unit directory entries.
+ Specifier_id and version within each tuple are separated
+ by a colon.
+
+Users: udev rules to set ownership and access permissions or ACLs of
+ /dev/fw[0-9]+ character device files
+
+
+What: /sys/bus/firewire/devices/fw[0-9]+[.][0-9]+/
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: linux1394-devel@lists.sourceforge.net
+Description:
+ IEEE 1394 unit device attributes.
+ Read-only. Immutable during the unit device's lifetime.
+ See IEEE 1212 for semantic definitions.
+
+ modalias
+ Same as MODALIAS in the uevent at device creation.
+
+ rom_index
+ Offset of the unit directory within the parent device's
+ (node device's) Configuration ROM, in quadlets.
+ Decimal string representation.
+
+
+What: /sys/bus/firewire/devices/*/
+Date: May 2007
+KernelVersion: 2.6.22
+Contact: linux1394-devel@lists.sourceforge.net
+Description:
+ Attributes common to IEEE 1394 node devices and unit devices.
+ Read-only. Mutable during the node device's lifetime.
+ Immutable during the unit device's lifetime.
+ See IEEE 1212 for semantic definitions.
+
+ These attributes are only created if the root directory of an
+ IEEE 1394 node or the unit directory of an IEEE 1394 unit
+ actually contains according entries.
+
+ hardware_version
+ Hexadecimal string representation of an u24.
+
+ hardware_version_name
+ Contents of a respective textual descriptor leaf.
+
+ model
+ Hexadecimal string representation of an u24.
+
+ model_name
+ Contents of a respective textual descriptor leaf.
+
+ specifier_id
+ Hexadecimal string representation of an u24.
+ Mandatory in unit directories according to IEEE 1212.
+
+ vendor
+ Hexadecimal string representation of an u24.
+ Mandatory in the root directory according to IEEE 1212.
+
+ vendor_name
+ Contents of a respective textual descriptor leaf.
+
+ version
+ Hexadecimal string representation of an u24.
+ Mandatory in unit directories according to IEEE 1212.
+
+
+What: /sys/bus/firewire/drivers/sbp2/fw*/host*/target*/*:*:*:*/ieee1394_id
+ formerly
+ /sys/bus/ieee1394/drivers/sbp2/fw*/host*/target*/*:*:*:*/ieee1394_id
+Date: Feb 2004
+KernelVersion: 2.6.4
+Contact: linux1394-devel@lists.sourceforge.net
+Description:
+ SCSI target port identifier and logical unit identifier of a
+ logical unit of an SBP-2 target. The identifiers are specified
+ in SAM-2...SAM-4 annex A. They are persistent and world-wide
+ unique properties the SBP-2 attached target.
+
+ Read-only attribute, immutable during the target's lifetime.
+ Format, as exposed by firewire-sbp2 since 2.6.22, May 2007:
+ Colon-separated hexadecimal string representations of
+ u64 EUI-64 : u24 directory_ID : u16 LUN
+ without 0x prefixes, without whitespace. The former sbp2 driver
+ (removed in 2.6.37 after being superseded by firewire-sbp2) used
+ a somewhat shorter format which was not as close to SAM.
+
+Users: udev rules to create /dev/disk/by-id/ symlinks
diff --git a/Documentation/ABI/stable/vdso b/Documentation/ABI/stable/vdso
new file mode 100644
index 000000000000..8a1cbb594497
--- /dev/null
+++ b/Documentation/ABI/stable/vdso
@@ -0,0 +1,27 @@
+On some architectures, when the kernel loads any userspace program it
+maps an ELF DSO into that program's address space. This DSO is called
+the vDSO and it often contains useful and highly-optimized alternatives
+to real syscalls.
+
+These functions are called just like ordinary C function according to
+your platform's ABI. Call them from a sensible context. (For example,
+if you set CS on x86 to something strange, the vDSO functions are
+within their rights to crash.) In addition, if you pass a bad
+pointer to a vDSO function, you might get SIGSEGV instead of -EFAULT.
+
+To find the DSO, parse the auxiliary vector passed to the program's
+entry point. The AT_SYSINFO_EHDR entry will point to the vDSO.
+
+The vDSO uses symbol versioning; whenever you request a symbol from the
+vDSO, specify the version you are expecting.
+
+Programs that dynamically link to glibc will use the vDSO automatically.
+Otherwise, you can use the reference parser in Documentation/vDSO/parse_vdso.c.
+
+Unless otherwise noted, the set of symbols with any given version and the
+ABI of those symbols is considered stable. It may vary across architectures,
+though.
+
+(As of this writing, this ABI documentation as been confirmed for x86_64.
+ The maintainers of the other vDSO-using architectures should confirm
+ that it is correct for their architecture.) \ No newline at end of file
diff --git a/Documentation/ABI/testing/sysfs-driver-hid-roccat-koneplus b/Documentation/ABI/testing/sysfs-driver-hid-roccat-koneplus
index c1b53b8bc2ae..65e6e5dd67e8 100644
--- a/Documentation/ABI/testing/sysfs-driver-hid-roccat-koneplus
+++ b/Documentation/ABI/testing/sysfs-driver-hid-roccat-koneplus
@@ -92,6 +92,14 @@ Description: The mouse has a tracking- and a distance-control-unit. These
This file is writeonly.
Users: http://roccat.sourceforge.net
+What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/koneplus/roccatkoneplus<minor>/talk
+Date: May 2011
+Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
+Description: Used to active some easy* functions of the mouse from outside.
+ The data has to be 16 bytes long.
+ This file is writeonly.
+Users: http://roccat.sourceforge.net
+
What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/<hid-bus>:<vendor-id>:<product-id>.<num>/koneplus/roccatkoneplus<minor>/tcu
Date: October 2010
Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
diff --git a/Documentation/ABI/testing/sysfs-driver-hid-wiimote b/Documentation/ABI/testing/sysfs-driver-hid-wiimote
new file mode 100644
index 000000000000..5d5a16ea57c6
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-hid-wiimote
@@ -0,0 +1,10 @@
+What: /sys/bus/hid/drivers/wiimote/<dev>/led1
+What: /sys/bus/hid/drivers/wiimote/<dev>/led2
+What: /sys/bus/hid/drivers/wiimote/<dev>/led3
+What: /sys/bus/hid/drivers/wiimote/<dev>/led4
+Date: July 2011
+KernelVersion: 3.1
+Contact: David Herrmann <dh.herrmann@googlemail.com>
+Description: Make it possible to set/get current led state. Reading from it
+ returns 0 if led is off and 1 if it is on. Writing 0 to it
+ disables the led, writing 1 enables it.
diff --git a/Documentation/DocBook/80211.tmpl b/Documentation/DocBook/80211.tmpl
index 8906648f962b..445289cd0e65 100644
--- a/Documentation/DocBook/80211.tmpl
+++ b/Documentation/DocBook/80211.tmpl
@@ -402,8 +402,9 @@
!Finclude/net/mac80211.h set_key_cmd
!Finclude/net/mac80211.h ieee80211_key_conf
!Finclude/net/mac80211.h ieee80211_key_flags
-!Finclude/net/mac80211.h ieee80211_tkip_key_type
-!Finclude/net/mac80211.h ieee80211_get_tkip_key
+!Finclude/net/mac80211.h ieee80211_get_tkip_p1k
+!Finclude/net/mac80211.h ieee80211_get_tkip_p1k_iv
+!Finclude/net/mac80211.h ieee80211_get_tkip_p2k
!Finclude/net/mac80211.h ieee80211_key_removed
</chapter>
diff --git a/Documentation/DocBook/kernel-hacking.tmpl b/Documentation/DocBook/kernel-hacking.tmpl
index 7b3f49363413..07a9c48de5a2 100644
--- a/Documentation/DocBook/kernel-hacking.tmpl
+++ b/Documentation/DocBook/kernel-hacking.tmpl
@@ -409,7 +409,7 @@ cond_resched(); /* Will sleep */
<para>
You should always compile your kernel
- <symbol>CONFIG_DEBUG_SPINLOCK_SLEEP</symbol> on, and it will warn
+ <symbol>CONFIG_DEBUG_ATOMIC_SLEEP</symbol> on, and it will warn
you if you break these rules. If you <emphasis>do</emphasis> break
the rules, you will eventually lock up your box.
</para>
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist
index da0382daa395..7b13be41c085 100644
--- a/Documentation/SubmitChecklist
+++ b/Documentation/SubmitChecklist
@@ -53,7 +53,7 @@ kernel patches.
12: Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
- CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously
+ CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_ATOMIC_SLEEP all simultaneously
enabled.
13: Has been build- and runtime tested with and without CONFIG_SMP and
diff --git a/Documentation/development-process/4.Coding b/Documentation/development-process/4.Coding
index f3f1a469443c..83f5f5b365a3 100644
--- a/Documentation/development-process/4.Coding
+++ b/Documentation/development-process/4.Coding
@@ -244,7 +244,7 @@ testing purposes. In particular, you should turn on:
- DEBUG_SLAB can find a variety of memory allocation and use errors; it
should be used on most development kernels.
- - DEBUG_SPINLOCK, DEBUG_SPINLOCK_SLEEP, and DEBUG_MUTEXES will find a
+ - DEBUG_SPINLOCK, DEBUG_ATOMIC_SLEEP, and DEBUG_MUTEXES will find a
number of common locking errors.
There are quite a few other debugging options, some of which will be
diff --git a/Documentation/devicetree/bindings/arm/primecell.txt b/Documentation/devicetree/bindings/arm/primecell.txt
new file mode 100644
index 000000000000..1d5d7a870ec7
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/primecell.txt
@@ -0,0 +1,21 @@
+* ARM Primecell Peripherals
+
+ARM, Ltd. Primecell peripherals have a standard id register that can be used to
+identify the peripheral type, vendor, and revision. This value can be used for
+driver matching.
+
+Required properties:
+
+- compatible : should be a specific value for peripheral and "arm,primecell"
+
+Optional properties:
+
+- arm,primecell-periphid : Value to override the h/w value with
+
+Example:
+
+serial@fff36000 {
+ compatible = "arm,pl011", "arm,primecell";
+ arm,primecell-periphid = <0x00341011>;
+};
+
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/sec.txt b/Documentation/devicetree/bindings/crypto/fsl-sec2.txt
index 2b6f2d45c45a..38988ef1336b 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/sec.txt
+++ b/Documentation/devicetree/bindings/crypto/fsl-sec2.txt
@@ -1,4 +1,4 @@
-Freescale SoC SEC Security Engines
+Freescale SoC SEC Security Engines versions 2.x-3.x
Required properties:
diff --git a/Documentation/devicetree/bindings/gpio/fsl-imx-gpio.txt b/Documentation/devicetree/bindings/gpio/fsl-imx-gpio.txt
new file mode 100644
index 000000000000..4363ae4b3c14
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/fsl-imx-gpio.txt
@@ -0,0 +1,22 @@
+* Freescale i.MX/MXC GPIO controller
+
+Required properties:
+- compatible : Should be "fsl,<soc>-gpio"
+- reg : Address and length of the register set for the device
+- interrupts : Should be the port interrupt shared by all 32 pins, if
+ one number. If two numbers, the first one is the interrupt shared
+ by low 16 pins and the second one is for high 16 pins.
+- gpio-controller : Marks the device node as a gpio controller.
+- #gpio-cells : Should be two. The first cell is the pin number and
+ the second cell is used to specify optional parameters (currently
+ unused).
+
+Example:
+
+gpio0: gpio@73f84000 {
+ compatible = "fsl,imx51-gpio", "fsl,imx31-gpio";
+ reg = <0x73f84000 0x4000>;
+ interrupts = <50 51>;
+ gpio-controller;
+ #gpio-cells = <2>;
+};
diff --git a/Documentation/devicetree/bindings/gpio/gpio.txt b/Documentation/devicetree/bindings/gpio/gpio.txt
index edaa84d288a1..4e16ba4feab0 100644
--- a/Documentation/devicetree/bindings/gpio/gpio.txt
+++ b/Documentation/devicetree/bindings/gpio/gpio.txt
@@ -4,17 +4,45 @@ Specifying GPIO information for devices
1) gpios property
-----------------
-Nodes that makes use of GPIOs should define them using `gpios' property,
-format of which is: <&gpio-controller1-phandle gpio1-specifier
- &gpio-controller2-phandle gpio2-specifier
- 0 /* holes are permitted, means no GPIO 3 */
- &gpio-controller4-phandle gpio4-specifier
- ...>;
+Nodes that makes use of GPIOs should specify them using one or more
+properties, each containing a 'gpio-list':
-Note that gpio-specifier length is controller dependent.
+ gpio-list ::= <single-gpio> [gpio-list]
+ single-gpio ::= <gpio-phandle> <gpio-specifier>
+ gpio-phandle : phandle to gpio controller node
+ gpio-specifier : Array of #gpio-cells specifying specific gpio
+ (controller specific)
+
+GPIO properties should be named "[<name>-]gpios". Exact
+meaning of each gpios property must be documented in the device tree
+binding for each device.
+
+For example, the following could be used to describe gpios pins to use
+as chip select lines; with chip selects 0, 1 and 3 populated, and chip
+select 2 left empty:
+
+ gpio1: gpio1 {
+ gpio-controller
+ #gpio-cells = <2>;
+ };
+ gpio2: gpio2 {
+ gpio-controller
+ #gpio-cells = <1>;
+ };
+ [...]
+ chipsel-gpios = <&gpio1 12 0>,
+ <&gpio1 13 0>,
+ <0>, /* holes are permitted, means no GPIO 2 */
+ <&gpio2 2>;
+
+Note that gpio-specifier length is controller dependent. In the
+above example, &gpio1 uses 2 cells to specify a gpio, while &gpio2
+only uses one.
gpio-specifier may encode: bank, pin position inside the bank,
whether pin is open-drain and whether pin is logically inverted.
+Exact meaning of each specifier cell is controller specific, and must
+be documented in the device tree binding for the device.
Example of the node using GPIOs:
@@ -28,8 +56,8 @@ and empty GPIO flags as accepted by the "qe_pio_e" gpio-controller.
2) gpio-controller nodes
------------------------
-Every GPIO controller node must have #gpio-cells property defined,
-this information will be used to translate gpio-specifiers.
+Every GPIO controller node must both an empty "gpio-controller"
+property, and have #gpio-cells contain the size of the gpio-specifier.
Example of two SOC GPIO banks defined as gpio-controller nodes:
diff --git a/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt b/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt
new file mode 100644
index 000000000000..eb4b530d64e1
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt
@@ -0,0 +1,8 @@
+NVIDIA Tegra 2 GPIO controller
+
+Required properties:
+- compatible : "nvidia,tegra20-gpio"
+- #gpio-cells : Should be two. The first cell is the pin number and the
+ second cell is used to specify optional parameters:
+ - bit 0 specifies polarity (0 for normal, 1 for inverted)
+- gpio-controller : Marks the device node as a GPIO controller.
diff --git a/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt b/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
new file mode 100644
index 000000000000..9841057d112b
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
@@ -0,0 +1,22 @@
+* Freescale (Enhanced) Configurable Serial Peripheral Interface
+ (CSPI/eCSPI) for i.MX
+
+Required properties:
+- compatible : Should be "fsl,<soc>-cspi" or "fsl,<soc>-ecspi"
+- reg : Offset and length of the register set for the device
+- interrupts : Should contain CSPI/eCSPI interrupt
+- fsl,spi-num-chipselects : Contains the number of the chipselect
+- cs-gpios : Specifies the gpio pins to be used for chipselects.
+
+Example:
+
+ecspi@70010000 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "fsl,imx51-ecspi";
+ reg = <0x70010000 0x4000>;
+ interrupts = <36>;
+ fsl,spi-num-chipselects = <2>;
+ cs-gpios = <&gpio3 24 0>, /* GPIO4_24 */
+ <&gpio3 25 0>; /* GPIO4_25 */
+};
diff --git a/Documentation/devicetree/bindings/spi/spi_nvidia.txt b/Documentation/devicetree/bindings/spi/spi_nvidia.txt
new file mode 100644
index 000000000000..6b9e51896693
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/spi_nvidia.txt
@@ -0,0 +1,5 @@
+NVIDIA Tegra 2 SPI device
+
+Required properties:
+- compatible : should be "nvidia,tegra20-spi".
+- gpios : should specify GPIOs used for chipselect.
diff --git a/Documentation/devicetree/bindings/tty/serial/of-serial.txt b/Documentation/devicetree/bindings/tty/serial/of-serial.txt
new file mode 100644
index 000000000000..b8b27b0aca10
--- /dev/null
+++ b/Documentation/devicetree/bindings/tty/serial/of-serial.txt
@@ -0,0 +1,36 @@
+* UART (Universal Asynchronous Receiver/Transmitter)
+
+Required properties:
+- compatible : one of:
+ - "ns8250"
+ - "ns16450"
+ - "ns16550a"
+ - "ns16550"
+ - "ns16750"
+ - "ns16850"
+ - "nvidia,tegra20-uart"
+ - "ibm,qpace-nwp-serial"
+ - "serial" if the port type is unknown.
+- reg : offset and length of the register set for the device.
+- interrupts : should contain uart interrupt.
+- clock-frequency : the input clock frequency for the UART.
+
+Optional properties:
+- current-speed : the current active speed of the UART.
+- reg-offset : offset to apply to the mapbase from the start of the registers.
+- reg-shift : quantity to shift the register offsets by.
+- reg-io-width : the size (in bytes) of the IO accesses that should be
+ performed on the device. There are some systems that require 32-bit
+ accesses to the UART (e.g. TI davinci).
+- used-by-rtas : set to indicate that the port is in use by the OpenFirmware
+ RTAS and should not be registered.
+
+Example:
+
+ uart@80230000 {
+ compatible = "ns8250";
+ reg = <0x80230000 0x100>;
+ clock-frequency = <3686400>;
+ interrupts = <10>;
+ reg-shift = <2>;
+ };
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index b1c921c27519..d59e71df5c5c 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -501,16 +501,6 @@ Who: NeilBrown <neilb@suse.de>
----------------------------
-What: cancel_rearming_delayed_work[queue]()
-When: 2.6.39
-
-Why: The functions have been superceded by cancel_delayed_work_sync()
- quite some time ago. The conversion is trivial and there is no
- in-kernel user left.
-Who: Tejun Heo <tj@kernel.org>
-
-----------------------------
-
What: Legacy, non-standard chassis intrusion detection interface.
When: June 2011
Why: The adm9240, w83792d and w83793 hardware monitoring drivers have
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 57d827d6071d..ca7e25292542 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -52,7 +52,7 @@ ata *);
void (*put_link) (struct dentry *, struct nameidata *, void *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int, unsigned int);
- int (*check_acl)(struct inode *, int, unsigned int);
+ int (*check_acl)(struct inode *, int);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (struct vfsmount *, struct dentry *, struct kstat *);
int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
@@ -412,7 +412,7 @@ prototypes:
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *);
int (*release) (struct inode *, struct file *);
- int (*fsync) (struct file *, int datasync);
+ int (*fsync) (struct file *, loff_t start, loff_t end, int datasync);
int (*aio_fsync) (struct kiocb *, int datasync);
int (*fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
@@ -438,9 +438,7 @@ prototypes:
locking rules:
All may block except for ->setlease.
- No VFS locks held on entry except for ->fsync and ->setlease.
-
-->fsync() has i_mutex on inode.
+ No VFS locks held on entry except for ->setlease.
->setlease has the file_list_lock held and must not sleep.
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index 6e29954851a2..7f8861d341ea 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -400,10 +400,31 @@ a file off.
--
[mandatory]
-
---
-[mandatory]
->get_sb() is gone. Switch to use of ->mount(). Typically it's just
a matter of switching from calling get_sb_... to mount_... and changing the
function type. If you were doing it manually, just switch from setting ->mnt_root
to some pointer to returning that pointer. On errors return ERR_PTR(...).
+
+--
+[mandatory]
+ ->permission(), generic_permission() and ->check_acl() have lost flags
+argument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask.
+ generic_permission() has also lost the check_acl argument; if you want
+non-NULL to be used for that inode, put it into ->i_op->check_acl.
+
+--
+[mandatory]
+ If you implement your own ->llseek() you must handle SEEK_HOLE and
+SEEK_DATA. You can hanle this by returning -EINVAL, but it would be nicer to
+support it in some way. The generic handler assumes that the entire file is
+data and there is a virtual hole at the end of the file. So if the provided
+offset is less than i_size and SEEK_DATA is specified, return the same offset.
+If the above is true for the offset and you are given SEEK_HOLE, return the end
+of the file. If the offset is i_size or greater return -ENXIO in either case.
+
+[mandatory]
+ If you have your own ->fsync() you must make sure to call
+filemap_write_and_wait_range() so that all dirty pages are synced out properly.
+You must also keep in mind that ->fsync() is not called with i_mutex held
+anymore, so if you require i_mutex locking you must make sure to take it and
+release it yourself.
diff --git a/Documentation/filesystems/ubifs.txt b/Documentation/filesystems/ubifs.txt
index 8e4fab639d9c..a0a61d2f389f 100644
--- a/Documentation/filesystems/ubifs.txt
+++ b/Documentation/filesystems/ubifs.txt
@@ -111,34 +111,6 @@ The following is an example of the kernel boot arguments to attach mtd0
to UBI and mount volume "rootfs":
ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs
-
-Module Parameters for Debugging
-===============================
-
-When UBIFS has been compiled with debugging enabled, there are 2 module
-parameters that are available to control aspects of testing and debugging.
-
-debug_chks Selects extra checks that UBIFS can do while running:
-
- Check Flag value
-
- General checks 1
- Check Tree Node Cache (TNC) 2
- Check indexing tree size 4
- Check orphan area 8
- Check old indexing tree 16
- Check LEB properties (lprops) 32
- Check leaf nodes and inodes 64
-
-debug_tsts Selects a mode of testing, as follows:
-
- Test mode Flag value
-
- Failure mode for recovery testing 4
-
-For example, set debug_chks to 3 to enable general and TNC checks.
-
-
References
==========
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 88b9f5519af9..eff6617c9a0f 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -229,6 +229,8 @@ struct super_operations {
ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
+ int (*nr_cached_objects)(struct super_block *);
+ void (*free_cached_objects)(struct super_block *, int);
};
All methods are called without any locks being held, unless otherwise
@@ -301,6 +303,26 @@ or bottom half).
quota_write: called by the VFS to write to filesystem quota file.
+ nr_cached_objects: called by the sb cache shrinking function for the
+ filesystem to return the number of freeable cached objects it contains.
+ Optional.
+
+ free_cache_objects: called by the sb cache shrinking function for the
+ filesystem to scan the number of objects indicated to try to free them.
+ Optional, but any filesystem implementing this method needs to also
+ implement ->nr_cached_objects for it to be called correctly.
+
+ We can't do anything with any errors that the filesystem might
+ encountered, hence the void return type. This will never be called if
+ the VM is trying to reclaim under GFP_NOFS conditions, hence this
+ method does not need to handle that situation itself.
+
+ Implementations must include conditional reschedule calls inside any
+ scanning loop that is done. This allows the VFS to determine
+ appropriate scan batch sizes without having to worry about whether
+ implementations will cause holdoff problems due to large scan batch
+ sizes.
+
Whoever sets up the inode is responsible for filling in the "i_op" field. This
is a pointer to a "struct inode_operations" which describes the methods that
can be performed on individual inodes.
@@ -333,8 +355,8 @@ struct inode_operations {
void * (*follow_link) (struct dentry *, struct nameidata *);
void (*put_link) (struct dentry *, struct nameidata *, void *);
void (*truncate) (struct inode *);
- int (*permission) (struct inode *, int, unsigned int);
- int (*check_acl)(struct inode *, int, unsigned int);
+ int (*permission) (struct inode *, int);
+ int (*check_acl)(struct inode *, int);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
@@ -423,7 +445,7 @@ otherwise noted.
permission: called by the VFS to check for access rights on a POSIX-like
filesystem.
- May be called in rcu-walk mode (flags & IPERM_FLAG_RCU). If in rcu-walk
+ May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in rcu-walk
mode, the filesystem must check the permission without blocking or
storing to the inode.
@@ -755,7 +777,7 @@ struct file_operations {
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *);
int (*release) (struct inode *, struct file *);
- int (*fsync) (struct file *, int datasync);
+ int (*fsync) (struct file *, loff_t, loff_t, int datasync);
int (*aio_fsync) (struct kiocb *, int datasync);
int (*fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
diff --git a/Documentation/ja_JP/SubmitChecklist b/Documentation/ja_JP/SubmitChecklist
index 2df4576f1173..cb5507b1ac81 100644
--- a/Documentation/ja_JP/SubmitChecklist
+++ b/Documentation/ja_JP/SubmitChecklist
@@ -68,7 +68,7 @@ Linux 銈兗銉嶃儷銉戙儍銉佹姇绋胯呭悜銇戙儊銈с儍銈儶銈广儓
12: CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, CONFIG_DEBUG_SLAB,
CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, CONFIG_DEBUG_SPINLOCK,
- CONFIG_DEBUG_SPINLOCK_SLEEP 銇撱倢銈夊叏銇︺倰鍚屾檪銇湁鍔广伀銇椼仸鍕曚綔纰鸿獚銈
+ CONFIG_DEBUG_ATOMIC_SLEEP 銇撱倢銈夊叏銇︺倰鍚屾檪銇湁鍔广伀銇椼仸鍕曚綔纰鸿獚銈
琛屻仯銇︺亸銇犮仌銇勩
13: CONFIG_SMP, CONFIG_PREEMPT 銈掓湁鍔广伀銇椼仧鍫村悎銇ㄧ劇鍔广伀銇椼仧鍫村悎銇浮鏂广仹
diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX
index 93dd7a714075..a9ba6720ffdf 100644
--- a/Documentation/mmc/00-INDEX
+++ b/Documentation/mmc/00-INDEX
@@ -4,3 +4,5 @@ mmc-dev-attrs.txt
- info on SD and MMC device attributes
mmc-dev-parts.txt
- info on SD and MMC device partitions
+mmc-async-req.txt
+ - info on mmc asynchronous requests
diff --git a/Documentation/mmc/mmc-async-req.txt b/Documentation/mmc/mmc-async-req.txt
new file mode 100644
index 000000000000..ae1907b10e4a
--- /dev/null
+++ b/Documentation/mmc/mmc-async-req.txt
@@ -0,0 +1,87 @@
+Rationale
+=========
+
+How significant is the cache maintenance overhead?
+It depends. Fast eMMC and multiple cache levels with speculative cache
+pre-fetch makes the cache overhead relatively significant. If the DMA
+preparations for the next request are done in parallel with the current
+transfer, the DMA preparation overhead would not affect the MMC performance.
+The intention of non-blocking (asynchronous) MMC requests is to minimize the
+time between when an MMC request ends and another MMC request begins.
+Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
+dma_unmap_sg are processing. Using non-blocking MMC requests makes it
+possible to prepare the caches for next job in parallel with an active
+MMC request.
+
+MMC block driver
+================
+
+The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
+The increase in throughput is proportional to the time it takes to
+prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
+a request and how fast the memory is. The faster the MMC/SD is the
+more significant the prepare request time becomes. Roughly the expected
+performance gain is 5% for large writes and 10% on large reads on a L2 cache
+platform. In power save mode, when clocks run on a lower frequency, the DMA
+preparation may cost even more. As long as these slower preparations are run
+in parallel with the transfer performance won't be affected.
+
+Details on measurements from IOZone and mmc_test
+================================================
+
+https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
+
+MMC core API extension
+======================
+
+There is one new public function mmc_start_req().
+It starts a new MMC command request for a host. The function isn't
+truly non-blocking. If there is an ongoing async request it waits
+for completion of that request and starts the new one and returns. It
+doesn't wait for the new request to complete. If there is no ongoing
+request it starts the new request and returns immediately.
+
+MMC host extensions
+===================
+
+There are two optional members in the mmc_host_ops -- pre_req() and
+post_req() -- that the host driver may implement in order to move work
+to before and after the actual mmc_host_ops.request() function is called.
+In the DMA case pre_req() may do dma_map_sg() and prepare the DMA
+descriptor, and post_req() runs the dma_unmap_sg().
+
+Optimize for the first request
+==============================
+
+The first request in a series of requests can't be prepared in parallel
+with the previous transfer, since there is no previous request.
+The argument is_first_req in pre_req() indicates that there is no previous
+request. The host driver may optimize for this scenario to minimize
+the performance loss. A way to optimize for this is to split the current
+request in two chunks, prepare the first chunk and start the request,
+and finally prepare the second chunk and start the transfer.
+
+Pseudocode to handle is_first_req scenario with minimal prepare overhead:
+
+if (is_first_req && req->size > threshold)
+ /* start MMC transfer for the complete transfer size */
+ mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
+
+ /*
+ * Begin to prepare DMA while cmd is being processed by MMC.
+ * The first chunk of the request should take the same time
+ * to prepare as the "MMC process command time".
+ * If prepare time exceeds MMC cmd time
+ * the transfer is delayed, guesstimate max 4k as first chunk size.
+ */
+ prepare_1st_chunk_for_dma(req);
+ /* flush pending desc to the DMAC (dmaengine.h) */
+ dma_issue_pending(req->dma_desc);
+
+ prepare_2nd_chunk_for_dma(req);
+ /*
+ * The second issue_pending should be called before MMC runs out
+ * of the first chunk. If the MMC runs out of the first data chunk
+ * before this call, the transfer is delayed.
+ */
+ dma_issue_pending(req->dma_desc);
diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c
index 2bac9618c345..65968fbf1e49 100644
--- a/Documentation/networking/ifenslave.c
+++ b/Documentation/networking/ifenslave.c
@@ -260,7 +260,7 @@ int main(int argc, char *argv[])
case 'V': opt_V++; exclusive++; break;
case '?':
- fprintf(stderr, usage_msg);
+ fprintf(stderr, "%s", usage_msg);
res = 2;
goto out;
}
@@ -268,13 +268,13 @@ int main(int argc, char *argv[])
/* options check */
if (exclusive > 1) {
- fprintf(stderr, usage_msg);
+ fprintf(stderr, "%s", usage_msg);
res = 2;
goto out;
}
if (opt_v || opt_V) {
- printf(version);
+ printf("%s", version);
if (opt_V) {
res = 0;
goto out;
@@ -282,14 +282,14 @@ int main(int argc, char *argv[])
}
if (opt_u) {
- printf(usage_msg);
+ printf("%s", usage_msg);
res = 0;
goto out;
}
if (opt_h) {
- printf(usage_msg);
- printf(help_msg);
+ printf("%s", usage_msg);
+ printf("%s", help_msg);
res = 0;
goto out;
}
@@ -309,7 +309,7 @@ int main(int argc, char *argv[])
goto out;
} else {
/* Just show usage */
- fprintf(stderr, usage_msg);
+ fprintf(stderr, "%s", usage_msg);
res = 2;
goto out;
}
@@ -320,7 +320,7 @@ int main(int argc, char *argv[])
master_ifname = *spp++;
if (master_ifname == NULL) {
- fprintf(stderr, usage_msg);
+ fprintf(stderr, "%s", usage_msg);
res = 2;
goto out;
}
@@ -339,7 +339,7 @@ int main(int argc, char *argv[])
if (slave_ifname == NULL) {
if (opt_d || opt_c) {
- fprintf(stderr, usage_msg);
+ fprintf(stderr, "%s", usage_msg);
res = 2;
goto out;
}
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index bfe924217f24..db2a4067013c 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -106,16 +106,6 @@ inet_peer_maxttl - INTEGER
when the number of entries in the pool is very small).
Measured in seconds.
-inet_peer_gc_mintime - INTEGER
- Minimum interval between garbage collection passes. This interval is
- in effect under high memory pressure on the pool.
- Measured in seconds.
-
-inet_peer_gc_maxtime - INTEGER
- Minimum interval between garbage collection passes. This interval is
- in effect under low (or absent) memory pressure on the pool.
- Measured in seconds.
-
TCP variables:
somaxconn - INTEGER
@@ -394,7 +384,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
min: Minimal size of receive buffer used by TCP sockets.
It is guaranteed to each TCP socket, even under moderate memory
pressure.
- Default: 8K
+ Default: 1 page
default: initial size of receive buffer used by TCP sockets.
This value overrides net.core.rmem_default used by other protocols.
@@ -483,7 +473,7 @@ tcp_window_scaling - BOOLEAN
tcp_wmem - vector of 3 INTEGERs: min, default, max
min: Amount of memory reserved for send buffers for TCP sockets.
Each TCP socket has rights to use it due to fact of its birth.
- Default: 4K
+ Default: 1 page
default: initial size of send buffer used by TCP sockets. This
value overrides net.core.wmem_default used by other protocols.
@@ -553,13 +543,13 @@ udp_rmem_min - INTEGER
Minimal size of receive buffer used by UDP sockets in moderation.
Each UDP socket is able to use the size for receiving data, even if
total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
- Default: 4096
+ Default: 1 page
udp_wmem_min - INTEGER
Minimal size of send buffer used by UDP sockets in moderation.
Each UDP socket is able to use the size for sending data, even if
total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
- Default: 4096
+ Default: 1 page
CIPSOv4 Variables:
@@ -1465,10 +1455,17 @@ sctp_mem - vector of 3 INTEGERs: min, pressure, max
Default is calculated at boot time from amount of available memory.
sctp_rmem - vector of 3 INTEGERs: min, default, max
- See tcp_rmem for a description.
+ Only the first value ("min") is used, "default" and "max" are
+ ignored.
+
+ min: Minimal size of receive buffer used by SCTP socket.
+ It is guaranteed to each SCTP socket (but not association) even
+ under moderate memory pressure.
+
+ Default: 1 page
sctp_wmem - vector of 3 INTEGERs: min, default, max
- See tcp_wmem for a description.
+ Currently this tunable has no effect.
addr_scope_policy - INTEGER
Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00
diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
new file mode 100644
index 000000000000..4b1c0dcef84c
--- /dev/null
+++ b/Documentation/networking/netdev-features.txt
@@ -0,0 +1,154 @@
+Netdev features mess and how to get out from it alive
+=====================================================
+
+Author:
+ Micha艂 Miros艂aw <mirq-linux@rere.qmqm.pl>
+
+
+
+ Part I: Feature sets
+======================
+
+Long gone are the days when a network card would just take and give packets
+verbatim. Today's devices add multiple features and bugs (read: offloads)
+that relieve an OS of various tasks like generating and checking checksums,
+splitting packets, classifying them. Those capabilities and their state
+are commonly referred to as netdev features in Linux kernel world.
+
+There are currently three sets of features relevant to the driver, and
+one used internally by network core:
+
+ 1. netdev->hw_features set contains features whose state may possibly
+ be changed (enabled or disabled) for a particular device by user's
+ request. This set should be initialized in ndo_init callback and not
+ changed later.
+
+ 2. netdev->features set contains features which are currently enabled
+ for a device. This should be changed only by network core or in
+ error paths of ndo_set_features callback.
+
+ 3. netdev->vlan_features set contains features whose state is inherited
+ by child VLAN devices (limits netdev->features set). This is currently
+ used for all VLAN devices whether tags are stripped or inserted in
+ hardware or software.
+
+ 4. netdev->wanted_features set contains feature set requested by user.
+ This set is filtered by ndo_fix_features callback whenever it or
+ some device-specific conditions change. This set is internal to
+ networking core and should not be referenced in drivers.
+
+
+
+ Part II: Controlling enabled features
+=======================================
+
+When current feature set (netdev->features) is to be changed, new set
+is calculated and filtered by calling ndo_fix_features callback
+and netdev_fix_features(). If the resulting set differs from current
+set, it is passed to ndo_set_features callback and (if the callback
+returns success) replaces value stored in netdev->features.
+NETDEV_FEAT_CHANGE notification is issued after that whenever current
+set might have changed.
+
+The following events trigger recalculation:
+ 1. device's registration, after ndo_init returned success
+ 2. user requested changes in features state
+ 3. netdev_update_features() is called
+
+ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks
+are treated as always returning success.
+
+A driver that wants to trigger recalculation must do so by calling
+netdev_update_features() while holding rtnl_lock. This should not be done
+from ndo_*_features callbacks. netdev->features should not be modified by
+driver except by means of ndo_fix_features callback.
+
+
+
+ Part III: Implementation hints
+================================
+
+ * ndo_fix_features:
+
+All dependencies between features should be resolved here. The resulting
+set can be reduced further by networking core imposed limitations (as coded
+in netdev_fix_features()). For this reason it is safer to disable a feature
+when its dependencies are not met instead of forcing the dependency on.
+
+This callback should not modify hardware nor driver state (should be
+stateless). It can be called multiple times between successive
+ndo_set_features calls.
+
+Callback must not alter features contained in NETIF_F_SOFT_FEATURES or
+NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but
+care must be taken as the change won't affect already configured VLANs.
+
+ * ndo_set_features:
+
+Hardware should be reconfigured to match passed feature set. The set
+should not be altered unless some error condition happens that can't
+be reliably detected in ndo_fix_features. In this case, the callback
+should update netdev->features to match resulting hardware state.
+Errors returned are not (and cannot be) propagated anywhere except dmesg.
+(Note: successful return is zero, >0 means silent error.)
+
+
+
+ Part IV: Features
+===================
+
+For current list of features, see include/linux/netdev_features.h.
+This section describes semantics of some of them.
+
+ * Transmit checksumming
+
+For complete description, see comments near the top of include/linux/skbuff.h.
+
+Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM.
+It means that device can fill TCP/UDP-like checksum anywhere in the packets
+whatever headers there might be.
+
+ * Transmit TCP segmentation offload
+
+NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
+set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).
+
+ * Transmit DMA from high memory
+
+On platforms where this is relevant, NETIF_F_HIGHDMA signals that
+ndo_start_xmit can handle skbs with frags in high memory.
+
+ * Transmit scatter-gather
+
+Those features say that ndo_start_xmit can handle fragmented skbs:
+NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
+chained skbs (skb->next/prev list).
+
+ * Software features
+
+Features contained in NETIF_F_SOFT_FEATURES are features of networking
+stack. Driver should not change behaviour based on them.
+
+ * LLTX driver (deprecated for hardware drivers)
+
+NETIF_F_LLTX should be set in drivers that implement their own locking in
+transmit path or don't need locking at all (e.g. software tunnels).
+In ndo_start_xmit, it is recommended to use a try_lock and return
+NETDEV_TX_LOCKED when the spin lock fails. The locking should also properly
+protect against other callbacks (the rules you need to find out).
+
+Don't use it for new drivers.
+
+ * netns-local device
+
+NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between
+network namespaces (e.g. loopback).
+
+Don't use it in drivers.
+
+ * VLAN challenged
+
+NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN
+headers. Some drivers set this because the cards can't handle the bigger MTU.
+[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU
+VLANs. This may be not useful, though.]
diff --git a/Documentation/networking/nfc.txt b/Documentation/networking/nfc.txt
new file mode 100644
index 000000000000..b24c29bdae27
--- /dev/null
+++ b/Documentation/networking/nfc.txt
@@ -0,0 +1,128 @@
+Linux NFC subsystem
+===================
+
+The Near Field Communication (NFC) subsystem is required to standardize the
+NFC device drivers development and to create an unified userspace interface.
+
+This document covers the architecture overview, the device driver interface
+description and the userspace interface description.
+
+Architecture overview
+---------------------
+
+The NFC subsystem is responsible for:
+ - NFC adapters management;
+ - Polling for targets;
+ - Low-level data exchange;
+
+The subsystem is divided in some parts. The 'core' is responsible for
+providing the device driver interface. On the other side, it is also
+responsible for providing an interface to control operations and low-level
+data exchange.
+
+The control operations are available to userspace via generic netlink.
+
+The low-level data exchange interface is provided by the new socket family
+PF_NFC. The NFC_SOCKPROTO_RAW performs raw communication with NFC targets.
+
+
+ +--------------------------------------+
+ | USER SPACE |
+ +--------------------------------------+
+ ^ ^
+ | low-level | control
+ | data exchange | operations
+ | |
+ | v
+ | +-----------+
+ | AF_NFC | netlink |
+ | socket +-----------+
+ | raw ^
+ | |
+ v v
+ +---------+ +-----------+
+ | rawsock | <--------> | core |
+ +---------+ +-----------+
+ ^
+ |
+ v
+ +-----------+
+ | driver |
+ +-----------+
+
+Device Driver Interface
+-----------------------
+
+When registering on the NFC subsystem, the device driver must inform the core
+of the set of supported NFC protocols and the set of ops callbacks. The ops
+callbacks that must be implemented are the following:
+
+* start_poll - setup the device to poll for targets
+* stop_poll - stop on progress polling operation
+* activate_target - select and initialize one of the targets found
+* deactivate_target - deselect and deinitialize the selected target
+* data_exchange - send data and receive the response (transceive operation)
+
+Userspace interface
+--------------------
+
+The userspace interface is divided in control operations and low-level data
+exchange operation.
+
+CONTROL OPERATIONS:
+
+Generic netlink is used to implement the interface to the control operations.
+The operations are composed by commands and events, all listed below:
+
+* NFC_CMD_GET_DEVICE - get specific device info or dump the device list
+* NFC_CMD_START_POLL - setup a specific device to polling for targets
+* NFC_CMD_STOP_POLL - stop the polling operation in a specific device
+* NFC_CMD_GET_TARGET - dump the list of targets found by a specific device
+
+* NFC_EVENT_DEVICE_ADDED - reports an NFC device addition
+* NFC_EVENT_DEVICE_REMOVED - reports an NFC device removal
+* NFC_EVENT_TARGETS_FOUND - reports START_POLL results when 1 or more targets
+are found
+
+The user must call START_POLL to poll for NFC targets, passing the desired NFC
+protocols through NFC_ATTR_PROTOCOLS attribute. The device remains in polling
+state until it finds any target. However, the user can stop the polling
+operation by calling STOP_POLL command. In this case, it will be checked if
+the requester of STOP_POLL is the same of START_POLL.
+
+If the polling operation finds one or more targets, the event TARGETS_FOUND is
+sent (including the device id). The user must call GET_TARGET to get the list of
+all targets found by such device. Each reply message has target attributes with
+relevant information such as the supported NFC protocols.
+
+All polling operations requested through one netlink socket are stopped when
+it's closed.
+
+LOW-LEVEL DATA EXCHANGE:
+
+The userspace must use PF_NFC sockets to perform any data communication with
+targets. All NFC sockets use AF_NFC:
+
+struct sockaddr_nfc {
+ sa_family_t sa_family;
+ __u32 dev_idx;
+ __u32 target_idx;
+ __u32 nfc_protocol;
+};
+
+To establish a connection with one target, the user must create an
+NFC_SOCKPROTO_RAW socket and call the 'connect' syscall with the sockaddr_nfc
+struct correctly filled. All information comes from NFC_EVENT_TARGETS_FOUND
+netlink event. As a target can support more than one NFC protocol, the user
+must inform which protocol it wants to use.
+
+Internally, 'connect' will result in an activate_target call to the driver.
+When the socket is closed, the target is deactivated.
+
+The data format exchanged through the sockets is NFC protocol dependent. For
+instance, when communicating with MIFARE tags, the data exchanged are MIFARE
+commands and their responses.
+
+The first received package is the response to the first sent package and so
+on. In order to allow valid "empty" responses, every data received has a NULL
+header of 1 byte.
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 80a7a3454902..57a24108b845 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -7,7 +7,7 @@ This is the driver for the MAC 10/100/1000 on-chip Ethernet controllers
(Synopsys IP blocks); it has been fully tested on STLinux platforms.
Currently this network device driver is for all STM embedded MAC/GMAC
-(7xxx SoCs). Other platforms start using it i.e. ARM SPEAr.
+(i.e. 7xxx/5xxx SoCs) and it's known working on other platforms i.e. ARM SPEAr.
DWC Ether MAC 10/100/1000 Universal version 3.41a and DWC Ether MAC 10/100
Universal version 4.0 have been used for developing the first code
@@ -71,7 +71,7 @@ Several performance tests on STM platforms showed this optimisation allows to sp
the CPU while having the maximum throughput.
4.4) WOL
-Wake up on Lan feature through Magic Frame is only supported for the GMAC
+Wake up on Lan feature through Magic and Unicast frames are supported for the GMAC
core.
4.5) DMA descriptors
@@ -91,11 +91,15 @@ LRO is not supported.
The driver is compatible with PAL to work with PHY and GPHY devices.
4.9) Platform information
-Several information came from the platform; please refer to the
-driver's Header file in include/linux directory.
+Several driver's information can be passed through the platform
+These are included in the include/linux/stmmac.h header file
+and detailed below as well:
-struct plat_stmmacenet_data {
+ struct plat_stmmacenet_data {
int bus_id;
+ int phy_addr;
+ int interface;
+ struct stmmac_mdio_bus_data *mdio_bus_data;
int pbl;
int clk_csr;
int has_gmac;
@@ -103,67 +107,135 @@ struct plat_stmmacenet_data {
int tx_coe;
int bugged_jumbo;
int pmt;
- void (*fix_mac_speed)(void *priv, unsigned int speed);
- void (*bus_setup)(unsigned long ioaddr);
-#ifdef CONFIG_STM_DRIVERS
- struct stm_pad_config *pad_config;
-#endif
- void *bsp_priv;
-};
+ int force_sf_dma_mode;
+ void (*fix_mac_speed)(void *priv, unsigned int speed);
+ void (*bus_setup)(void __iomem *ioaddr);
+ int (*init)(struct platform_device *pdev);
+ void (*exit)(struct platform_device *pdev);
+ void *bsp_priv;
+ };
Where:
-- pbl (Programmable Burst Length) is maximum number of
- beats to be transferred in one DMA transaction.
- GMAC also enables the 4xPBL by default.
-- fix_mac_speed and bus_setup are used to configure internal target
- registers (on STM platforms);
-- has_gmac: GMAC core is on board (get it at run-time in the next step);
-- bus_id: bus identifier.
-- tx_coe: core is able to perform the tx csum in HW.
-- enh_desc: if sets the MAC will use the enhanced descriptor structure.
-- clk_csr: CSR Clock range selection.
-- bugged_jumbo: some HWs are not able to perform the csum in HW for
- over-sized frames due to limited buffer sizes. Setting this
- flag the csum will be done in SW on JUMBO frames.
-
-struct plat_stmmacphy_data {
- int bus_id;
- int phy_addr;
- unsigned int phy_mask;
- int interface;
- int (*phy_reset)(void *priv);
- void *priv;
-};
+ o bus_id: bus identifier.
+ o phy_addr: the physical address can be passed from the platform.
+ If it is set to -1 the driver will automatically
+ detect it at run-time by probing all the 32 addresses.
+ o interface: PHY device's interface.
+ o mdio_bus_data: specific platform fields for the MDIO bus.
+ o pbl: the Programmable Burst Length is maximum number of beats to
+ be transferred in one DMA transaction.
+ GMAC also enables the 4xPBL by default.
+ o clk_csr: CSR Clock range selection.
+ o has_gmac: uses the GMAC core.
+ o enh_desc: if sets the MAC will use the enhanced descriptor structure.
+ o tx_coe: core is able to perform the tx csum in HW.
+ o bugged_jumbo: some HWs are not able to perform the csum in HW for
+ over-sized frames due to limited buffer sizes.
+ Setting this flag the csum will be done in SW on
+ JUMBO frames.
+ o pmt: core has the embedded power module (optional).
+ o force_sf_dma_mode: force DMA to use the Store and Forward mode
+ instead of the Threshold.
+ o fix_mac_speed: this callback is used for modifying some syscfg registers
+ (on ST SoCs) according to the link speed negotiated by the
+ physical layer .
+ o bus_setup: perform HW setup of the bus. For example, on some ST platforms
+ this field is used to configure the AMBA bridge to generate more
+ efficient STBus traffic.
+ o init/exit: callbacks used for calling a custom initialisation;
+ this is sometime necessary on some platforms (e.g. ST boxes)
+ where the HW needs to have set some PIO lines or system cfg
+ registers.
+ o custom_cfg: this is a custom configuration that can be passed while
+ initialising the resources.
+
+The we have:
+
+ struct stmmac_mdio_bus_data {
+ int bus_id;
+ int (*phy_reset)(void *priv);
+ unsigned int phy_mask;
+ int *irqs;
+ int probed_phy_irq;
+ };
Where:
-- bus_id: bus identifier;
-- phy_addr: physical address used for the attached phy device;
- set it to -1 to get it at run-time;
-- interface: physical MII interface mode;
-- phy_reset: hook to reset HW function.
-
-SOURCES:
-- Kconfig
-- Makefile
-- stmmac_main.c: main network device driver;
-- stmmac_mdio.c: mdio functions;
-- stmmac_ethtool.c: ethtool support;
-- stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts
- Only tested on ST40 platforms based.
-- stmmac.h: private driver structure;
-- common.h: common definitions and VFTs;
-- descs.h: descriptor structure definitions;
-- dwmac1000_core.c: GMAC core functions;
-- dwmac1000_dma.c: dma functions for the GMAC chip;
-- dwmac1000.h: specific header file for the GMAC;
-- dwmac100_core: MAC 100 core and dma code;
-- dwmac100_dma.c: dma funtions for the MAC chip;
-- dwmac1000.h: specific header file for the MAC;
-- dwmac_lib.c: generic DMA functions shared among chips
-- enh_desc.c: functions for handling enhanced descriptors
-- norm_desc.c: functions for handling normal descriptors
-
-TODO:
-- XGMAC controller is not supported.
-- Review the timer optimisation code to use an embedded device that seems to be
+ o bus_id: bus identifier;
+ o phy_reset: hook to reset the phy device attached to the bus.
+ o phy_mask: phy mask passed when register the MDIO bus within the driver.
+ o irqs: list of IRQs, one per PHY.
+ o probed_phy_irq: if irqs is NULL, use this for probed PHY.
+
+Below an example how the structures above are using on ST platforms.
+
+ static struct plat_stmmacenet_data stxYYY_ethernet_platform_data = {
+ .pbl = 32,
+ .has_gmac = 0,
+ .enh_desc = 0,
+ .fix_mac_speed = stxYYY_ethernet_fix_mac_speed,
+ |
+ |-> to write an internal syscfg
+ | on this platform when the
+ | link speed changes from 10 to
+ | 100 and viceversa
+ .init = &stmmac_claim_resource,
+ |
+ |-> On ST SoC this calls own "PAD"
+ | manager framework to claim
+ | all the resources necessary
+ | (GPIO ...). The .custom_cfg field
+ | is used to pass a custom config.
+};
+
+Below the usage of the stmmac_mdio_bus_data: on this SoC, in fact,
+there are two MAC cores: one MAC is for MDIO Bus/PHY emulation
+with fixed_link support.
+
+static struct stmmac_mdio_bus_data stmmac1_mdio_bus = {
+ .bus_id = 1,
+ |
+ |-> phy device on the bus_id 1
+ .phy_reset = phy_reset;
+ |
+ |-> function to provide the phy_reset on this board
+ .phy_mask = 0,
+};
+
+static struct fixed_phy_status stmmac0_fixed_phy_status = {
+ .link = 1,
+ .speed = 100,
+ .duplex = 1,
+};
+
+During the board's device_init we can configure the first
+MAC for fixed_link by calling:
+ fixed_phy_add(PHY_POLL, 1, &stmmac0_fixed_phy_status));)
+and the second one, with a real PHY device attached to the bus,
+by using the stmmac_mdio_bus_data structure (to provide the id, the
+reset procedure etc).
+
+4.10) List of source files:
+ o Kconfig
+ o Makefile
+ o stmmac_main.c: main network device driver;
+ o stmmac_mdio.c: mdio functions;
+ o stmmac_ethtool.c: ethtool support;
+ o stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts
+ Only tested on ST40 platforms based.
+ o stmmac.h: private driver structure;
+ o common.h: common definitions and VFTs;
+ o descs.h: descriptor structure definitions;
+ o dwmac1000_core.c: GMAC core functions;
+ o dwmac1000_dma.c: dma functions for the GMAC chip;
+ o dwmac1000.h: specific header file for the GMAC;
+ o dwmac100_core: MAC 100 core and dma code;
+ o dwmac100_dma.c: dma funtions for the MAC chip;
+ o dwmac1000.h: specific header file for the MAC;
+ o dwmac_lib.c: generic DMA functions shared among chips
+ o enh_desc.c: functions for handling enhanced descriptors
+ o norm_desc.c: functions for handling normal descriptors
+
+5) TODO:
+ o XGMAC is not supported.
+ o Review the timer optimisation code to use an embedded device that will be
available in new chip generations.
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index 64565aac6e40..3384d5996be2 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -506,8 +506,8 @@ routines. Nevertheless, different callback pointers are used in case there is a
situation where it actually matters.
-Device Power Domains
---------------------
+Device Power Management Domains
+-------------------------------
Sometimes devices share reference clocks or other power resources. In those
cases it generally is not possible to put devices into low-power states
individually. Instead, a set of devices sharing a power resource can be put
@@ -516,8 +516,8 @@ power resource. Of course, they also need to be put into the full-power state
together, by turning the shared power resource on. A set of devices with this
property is often referred to as a power domain.
-Support for power domains is provided through the pwr_domain field of struct
-device. This field is a pointer to an object of type struct dev_power_domain,
+Support for power domains is provided through the pm_domain field of struct
+device. This field is a pointer to an object of type struct dev_pm_domain,
defined in include/linux/pm.h, providing a set of power management callbacks
analogous to the subsystem-level and device driver callbacks that are executed
for the given device during all power transitions, instead of the respective
@@ -604,7 +604,7 @@ state temporarily, for example so that its system wakeup capability can be
disabled. This all depends on the hardware and the design of the subsystem and
device driver in question.
-During system-wide resume from a sleep state it's best to put devices into the
-full-power state, as explained in Documentation/power/runtime_pm.txt. Refer to
-that document for more information regarding this particular issue as well as
+During system-wide resume from a sleep state it's easiest to put devices into
+the full-power state, as explained in Documentation/power/runtime_pm.txt. Refer
+to that document for more information regarding this particular issue as well as
for information on the device runtime power management framework in general.
diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt
index 5ae70a12c1e2..3035d00757ad 100644
--- a/Documentation/power/opp.txt
+++ b/Documentation/power/opp.txt
@@ -321,6 +321,8 @@ opp_init_cpufreq_table - cpufreq framework typically is initialized with
addition to CONFIG_PM as power management feature is required to
dynamically scale voltage and frequency in a system.
+opp_free_cpufreq_table - Free up the table allocated by opp_init_cpufreq_table
+
7. Data Structures
==================
Typically an SoC contains multiple voltage domains which are variable. Each
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index b24875b1ced5..14dd3c6ad97e 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -1,39 +1,39 @@
-Run-time Power Management Framework for I/O Devices
+Runtime Power Management Framework for I/O Devices
(C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
(C) 2010 Alan Stern <stern@rowland.harvard.edu>
1. Introduction
-Support for run-time power management (run-time PM) of I/O devices is provided
+Support for runtime power management (runtime PM) of I/O devices is provided
at the power management core (PM core) level by means of:
* The power management workqueue pm_wq in which bus types and device drivers can
put their PM-related work items. It is strongly recommended that pm_wq be
- used for queuing all work items related to run-time PM, because this allows
+ used for queuing all work items related to runtime PM, because this allows
them to be synchronized with system-wide power transitions (suspend to RAM,
hibernation and resume from system sleep states). pm_wq is declared in
include/linux/pm_runtime.h and defined in kernel/power/main.c.
-* A number of run-time PM fields in the 'power' member of 'struct device' (which
+* A number of runtime PM fields in the 'power' member of 'struct device' (which
is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
- be used for synchronizing run-time PM operations with one another.
+ be used for synchronizing runtime PM operations with one another.
-* Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in
+* Three device runtime PM callbacks in 'struct dev_pm_ops' (defined in
include/linux/pm.h).
* A set of helper functions defined in drivers/base/power/runtime.c that can be
- used for carrying out run-time PM operations in such a way that the
+ used for carrying out runtime PM operations in such a way that the
synchronization between them is taken care of by the PM core. Bus types and
device drivers are encouraged to use these functions.
-The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM
+The runtime PM callbacks present in 'struct dev_pm_ops', the device runtime PM
fields of 'struct dev_pm_info' and the core helper functions provided for
-run-time PM are described below.
+runtime PM are described below.
-2. Device Run-time PM Callbacks
+2. Device Runtime PM Callbacks
-There are three device run-time PM callbacks defined in 'struct dev_pm_ops':
+There are three device runtime PM callbacks defined in 'struct dev_pm_ops':
struct dev_pm_ops {
...
@@ -72,11 +72,11 @@ knows what to do to handle the device).
not mean that the device has been put into a low power state. It is
supposed to mean, however, that the device will not process data and will
not communicate with the CPU(s) and RAM until the subsystem-level resume
- callback is executed for it. The run-time PM status of a device after
+ callback is executed for it. The runtime PM status of a device after
successful execution of the subsystem-level suspend callback is 'suspended'.
* If the subsystem-level suspend callback returns -EBUSY or -EAGAIN,
- the device's run-time PM status is 'active', which means that the device
+ the device's runtime PM status is 'active', which means that the device
_must_ be fully operational afterwards.
* If the subsystem-level suspend callback returns an error code different
@@ -104,7 +104,7 @@ the device).
* Once the subsystem-level resume callback has completed successfully, the PM
core regards the device as fully operational, which means that the device
- _must_ be able to complete I/O operations as needed. The run-time PM status
+ _must_ be able to complete I/O operations as needed. The runtime PM status
of the device is then 'active'.
* If the subsystem-level resume callback returns an error code, the PM core
@@ -130,7 +130,7 @@ device in that case. The value returned by this callback is ignored by the PM
core.
The helper functions provided by the PM core, described in Section 4, guarantee
-that the following constraints are met with respect to the bus type's run-time
+that the following constraints are met with respect to the bus type's runtime
PM callbacks:
(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
@@ -142,7 +142,7 @@ PM callbacks:
(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
devices (i.e. the PM core will only execute ->runtime_idle() or
- ->runtime_suspend() for the devices the run-time PM status of which is
+ ->runtime_suspend() for the devices the runtime PM status of which is
'active').
(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
@@ -151,7 +151,7 @@ PM callbacks:
flag of which is set.
(4) ->runtime_resume() can only be executed for 'suspended' devices (i.e. the
- PM core will only execute ->runtime_resume() for the devices the run-time
+ PM core will only execute ->runtime_resume() for the devices the runtime
PM status of which is 'suspended').
Additionally, the helper functions provided by the PM core obey the following
@@ -171,9 +171,9 @@ rules:
scheduled requests to execute the other callbacks for the same device,
except for scheduled autosuspends.
-3. Run-time PM Device Fields
+3. Runtime PM Device Fields
-The following device run-time PM fields are present in 'struct dev_pm_info', as
+The following device runtime PM fields are present in 'struct dev_pm_info', as
defined in include/linux/pm.h:
struct timer_list suspend_timer;
@@ -205,7 +205,7 @@ defined in include/linux/pm.h:
unsigned int disable_depth;
- used for disabling the helper funcions (they work normally if this is
- equal to zero); the initial value of it is 1 (i.e. run-time PM is
+ equal to zero); the initial value of it is 1 (i.e. runtime PM is
initially disabled for all devices)
unsigned int runtime_error;
@@ -229,10 +229,10 @@ defined in include/linux/pm.h:
suspend to complete; means "start a resume as soon as you've suspended"
unsigned int run_wake;
- - set if the device is capable of generating run-time wake-up events
+ - set if the device is capable of generating runtime wake-up events
enum rpm_status runtime_status;
- - the run-time PM status of the device; this field's initial value is
+ - the runtime PM status of the device; this field's initial value is
RPM_SUSPENDED, which means that each device is initially regarded by the
PM core as 'suspended', regardless of its real hardware status
@@ -243,7 +243,7 @@ defined in include/linux/pm.h:
and pm_runtime_forbid() helper functions
unsigned int no_callbacks;
- - indicates that the device does not use the run-time PM callbacks (see
+ - indicates that the device does not use the runtime PM callbacks (see
Section 8); it may be modified only by the pm_runtime_no_callbacks()
helper function
@@ -270,16 +270,16 @@ defined in include/linux/pm.h:
All of the above fields are members of the 'power' member of 'struct device'.
-4. Run-time PM Device Helper Functions
+4. Runtime PM Device Helper Functions
-The following run-time PM helper functions are defined in
+The following runtime PM helper functions are defined in
drivers/base/power/runtime.c and include/linux/pm_runtime.h:
void pm_runtime_init(struct device *dev);
- - initialize the device run-time PM fields in 'struct dev_pm_info'
+ - initialize the device runtime PM fields in 'struct dev_pm_info'
void pm_runtime_remove(struct device *dev);
- - make sure that the run-time PM of the device will be disabled after
+ - make sure that the runtime PM of the device will be disabled after
removing the device from device hierarchy
int pm_runtime_idle(struct device *dev);
@@ -289,9 +289,10 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
int pm_runtime_suspend(struct device *dev);
- execute the subsystem-level suspend callback for the device; returns 0 on
- success, 1 if the device's run-time PM status was already 'suspended', or
+ success, 1 if the device's runtime PM status was already 'suspended', or
error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
- to suspend the device again in future
+ to suspend the device again in future and -EACCES means that
+ 'power.disable_depth' is different from 0
int pm_runtime_autosuspend(struct device *dev);
- same as pm_runtime_suspend() except that the autosuspend delay is taken
@@ -301,10 +302,11 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
int pm_runtime_resume(struct device *dev);
- execute the subsystem-level resume callback for the device; returns 0 on
- success, 1 if the device's run-time PM status was already 'active' or
+ success, 1 if the device's runtime PM status was already 'active' or
error code on failure, where -EAGAIN means it may be safe to attempt to
resume the device again in future, but 'power.runtime_error' should be
- checked additionally
+ checked additionally, and -EACCES means that 'power.disable_depth' is
+ different from 0
int pm_request_idle(struct device *dev);
- submit a request to execute the subsystem-level idle callback for the
@@ -321,7 +323,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
device in future, where 'delay' is the time to wait before queuing up a
suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work
item is queued up immediately); returns 0 on success, 1 if the device's PM
- run-time status was already 'suspended', or error code if the request
+ runtime status was already 'suspended', or error code if the request
hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
->runtime_suspend() is already scheduled and not yet expired, the new
value of 'delay' will be used as the time to wait
@@ -329,7 +331,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
int pm_request_resume(struct device *dev);
- submit a request to execute the subsystem-level resume callback for the
device (the request is represented by a work item in pm_wq); returns 0 on
- success, 1 if the device's run-time PM status was already 'active', or
+ success, 1 if the device's runtime PM status was already 'active', or
error code if the request hasn't been queued up
void pm_runtime_get_noresume(struct device *dev);
@@ -367,22 +369,32 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
pm_runtime_autosuspend(dev) and return its result
void pm_runtime_enable(struct device *dev);
- - enable the run-time PM helper functions to run the device bus type's
- run-time PM callbacks described in Section 2
+ - decrement the device's 'power.disable_depth' field; if that field is equal
+ to zero, the runtime PM helper functions can execute subsystem-level
+ callbacks described in Section 2 for the device
int pm_runtime_disable(struct device *dev);
- - prevent the run-time PM helper functions from running subsystem-level
- run-time PM callbacks for the device, make sure that all of the pending
- run-time PM operations on the device are either completed or canceled;
+ - increment the device's 'power.disable_depth' field (if the value of that
+ field was previously zero, this prevents subsystem-level runtime PM
+ callbacks from being run for the device), make sure that all of the pending
+ runtime PM operations on the device are either completed or canceled;
returns 1 if there was a resume request pending and it was necessary to
execute the subsystem-level resume callback for the device to satisfy that
request, otherwise 0 is returned
+ int pm_runtime_barrier(struct device *dev);
+ - check if there's a resume request pending for the device and resume it
+ (synchronously) in that case, cancel any other pending runtime PM requests
+ regarding it and wait for all runtime PM operations on it in progress to
+ complete; returns 1 if there was a resume request pending and it was
+ necessary to execute the subsystem-level resume callback for the device to
+ satisfy that request, otherwise 0 is returned
+
void pm_suspend_ignore_children(struct device *dev, bool enable);
- set/unset the power.ignore_children flag of the device
int pm_runtime_set_active(struct device *dev);
- - clear the device's 'power.runtime_error' flag, set the device's run-time
+ - clear the device's 'power.runtime_error' flag, set the device's runtime
PM status to 'active' and update its parent's counter of 'active'
children as appropriate (it is only valid to use this function if
'power.runtime_error' is set or 'power.disable_depth' is greater than
@@ -390,7 +402,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
which is not active and the 'power.ignore_children' flag of which is unset
void pm_runtime_set_suspended(struct device *dev);
- - clear the device's 'power.runtime_error' flag, set the device's run-time
+ - clear the device's 'power.runtime_error' flag, set the device's runtime
PM status to 'suspended' and update its parent's counter of 'active'
children as appropriate (it is only valid to use this function if
'power.runtime_error' is set or 'power.disable_depth' is greater than
@@ -400,6 +412,9 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
- return true if the device's runtime PM status is 'suspended' and its
'power.disable_depth' field is equal to zero, or false otherwise
+ bool pm_runtime_status_suspended(struct device *dev);
+ - return true if the device's runtime PM status is 'suspended'
+
void pm_runtime_allow(struct device *dev);
- set the power.runtime_auto flag for the device and decrease its usage
counter (used by the /sys/devices/.../power/control interface to
@@ -411,7 +426,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
effectively prevent the device from being power managed at run time)
void pm_runtime_no_callbacks(struct device *dev);
- - set the power.no_callbacks flag for the device and remove the run-time
+ - set the power.no_callbacks flag for the device and remove the runtime
PM attributes from /sys/devices/.../power (or prevent them from being
added when the device is registered)
@@ -431,7 +446,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
- set the power.autosuspend_delay value to 'delay' (expressed in
- milliseconds); if 'delay' is negative then run-time suspends are
+ milliseconds); if 'delay' is negative then runtime suspends are
prevented
unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
@@ -470,35 +485,35 @@ pm_runtime_resume()
pm_runtime_get_sync()
pm_runtime_put_sync_suspend()
-5. Run-time PM Initialization, Device Probing and Removal
+5. Runtime PM Initialization, Device Probing and Removal
-Initially, the run-time PM is disabled for all devices, which means that the
-majority of the run-time PM helper funtions described in Section 4 will return
+Initially, the runtime PM is disabled for all devices, which means that the
+majority of the runtime PM helper funtions described in Section 4 will return
-EAGAIN until pm_runtime_enable() is called for the device.
-In addition to that, the initial run-time PM status of all devices is
+In addition to that, the initial runtime PM status of all devices is
'suspended', but it need not reflect the actual physical state of the device.
Thus, if the device is initially active (i.e. it is able to process I/O), its
-run-time PM status must be changed to 'active', with the help of
+runtime PM status must be changed to 'active', with the help of
pm_runtime_set_active(), before pm_runtime_enable() is called for the device.
-However, if the device has a parent and the parent's run-time PM is enabled,
+However, if the device has a parent and the parent's runtime PM is enabled,
calling pm_runtime_set_active() for the device will affect the parent, unless
the parent's 'power.ignore_children' flag is set. Namely, in that case the
parent won't be able to suspend at run time, using the PM core's helper
functions, as long as the child's status is 'active', even if the child's
-run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
+runtime PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
the child yet or pm_runtime_disable() has been called for it). For this reason,
once pm_runtime_set_active() has been called for the device, pm_runtime_enable()
-should be called for it too as soon as reasonably possible or its run-time PM
+should be called for it too as soon as reasonably possible or its runtime PM
status should be changed back to 'suspended' with the help of
pm_runtime_set_suspended().
-If the default initial run-time PM status of the device (i.e. 'suspended')
+If the default initial runtime PM status of the device (i.e. 'suspended')
reflects the actual state of the device, its bus type's or its driver's
->probe() callback will likely need to wake it up using one of the PM core's
helper functions described in Section 4. In that case, pm_runtime_resume()
-should be used. Of course, for this purpose the device's run-time PM has to be
+should be used. Of course, for this purpose the device's runtime PM has to be
enabled earlier by calling pm_runtime_enable().
If the device bus type's or driver's ->probe() callback runs
@@ -529,33 +544,33 @@ The user space can effectively disallow the driver of the device to power manage
it at run time by changing the value of its /sys/devices/.../power/control
attribute to "on", which causes pm_runtime_forbid() to be called. In principle,
this mechanism may also be used by the driver to effectively turn off the
-run-time power management of the device until the user space turns it on.
-Namely, during the initialization the driver can make sure that the run-time PM
+runtime power management of the device until the user space turns it on.
+Namely, during the initialization the driver can make sure that the runtime PM
status of the device is 'active' and call pm_runtime_forbid(). It should be
noted, however, that if the user space has already intentionally changed the
value of /sys/devices/.../power/control to "auto" to allow the driver to power
manage the device at run time, the driver may confuse it by using
pm_runtime_forbid() this way.
-6. Run-time PM and System Sleep
+6. Runtime PM and System Sleep
-Run-time PM and system sleep (i.e., system suspend and hibernation, also known
+Runtime PM and system sleep (i.e., system suspend and hibernation, also known
as suspend-to-RAM and suspend-to-disk) interact with each other in a couple of
ways. If a device is active when a system sleep starts, everything is
straightforward. But what should happen if the device is already suspended?
-The device may have different wake-up settings for run-time PM and system sleep.
-For example, remote wake-up may be enabled for run-time suspend but disallowed
+The device may have different wake-up settings for runtime PM and system sleep.
+For example, remote wake-up may be enabled for runtime suspend but disallowed
for system sleep (device_may_wakeup(dev) returns 'false'). When this happens,
the subsystem-level system suspend callback is responsible for changing the
device's wake-up setting (it may leave that to the device driver's system
suspend routine). It may be necessary to resume the device and suspend it again
in order to do so. The same is true if the driver uses different power levels
-or other settings for run-time suspend and system sleep.
+or other settings for runtime suspend and system sleep.
-During system resume, devices generally should be brought back to full power,
-even if they were suspended before the system sleep began. There are several
-reasons for this, including:
+During system resume, the simplest approach is to bring all devices back to full
+power, even if they had been suspended before the system suspend began. There
+are several reasons for this, including:
* The device might need to switch power levels, wake-up settings, etc.
@@ -570,18 +585,50 @@ reasons for this, including:
* The device might need to be reset.
* Even though the device was suspended, if its usage counter was > 0 then most
- likely it would need a run-time resume in the near future anyway.
-
- * Always going back to full power is simplest.
+ likely it would need a runtime resume in the near future anyway.
-If the device was suspended before the sleep began, then its run-time PM status
-will have to be updated to reflect the actual post-system sleep status. The way
-to do this is:
+If the device had been suspended before the system suspend began and it's
+brought back to full power during resume, then its runtime PM status will have
+to be updated to reflect the actual post-system sleep status. The way to do
+this is:
pm_runtime_disable(dev);
pm_runtime_set_active(dev);
pm_runtime_enable(dev);
+The PM core always increments the runtime usage counter before calling the
+->suspend() callback and decrements it after calling the ->resume() callback.
+Hence disabling runtime PM temporarily like this will not cause any runtime
+suspend attempts to be permanently lost. If the usage count goes to zero
+following the return of the ->resume() callback, the ->runtime_idle() callback
+will be invoked as usual.
+
+On some systems, however, system sleep is not entered through a global firmware
+or hardware operation. Instead, all hardware components are put into low-power
+states directly by the kernel in a coordinated way. Then, the system sleep
+state effectively follows from the states the hardware components end up in
+and the system is woken up from that state by a hardware interrupt or a similar
+mechanism entirely under the kernel's control. As a result, the kernel never
+gives control away and the states of all devices during resume are precisely
+known to it. If that is the case and none of the situations listed above takes
+place (in particular, if the system is not waking up from hibernation), it may
+be more efficient to leave the devices that had been suspended before the system
+suspend began in the suspended state.
+
+The PM core does its best to reduce the probability of race conditions between
+the runtime PM and system suspend/resume (and hibernation) callbacks by carrying
+out the following operations:
+
+ * During system suspend it calls pm_runtime_get_noresume() and
+ pm_runtime_barrier() for every device right before executing the
+ subsystem-level .suspend() callback for it. In addition to that it calls
+ pm_runtime_disable() for every device right after executing the
+ subsystem-level .suspend() callback for it.
+
+ * During system resume it calls pm_runtime_enable() and pm_runtime_put_sync()
+ for every device right before and right after executing the subsystem-level
+ .resume() callback for it, respectively.
+
7. Generic subsystem callbacks
Subsystems may wish to conserve code space by using the set of generic power
@@ -606,40 +653,68 @@ driver/base/power/generic_ops.c:
callback provided by its driver and return its result, or return 0 if not
defined
+ int pm_generic_suspend_noirq(struct device *dev);
+ - if pm_runtime_suspended(dev) returns "false", invoke the ->suspend_noirq()
+ callback provided by the device's driver and return its result, or return
+ 0 if not defined
+
int pm_generic_resume(struct device *dev);
- invoke the ->resume() callback provided by the driver of this device and,
if successful, change the device's runtime PM status to 'active'
+ int pm_generic_resume_noirq(struct device *dev);
+ - invoke the ->resume_noirq() callback provided by the driver of this device
+
int pm_generic_freeze(struct device *dev);
- if the device has not been suspended at run time, invoke the ->freeze()
callback provided by its driver and return its result, or return 0 if not
defined
+ int pm_generic_freeze_noirq(struct device *dev);
+ - if pm_runtime_suspended(dev) returns "false", invoke the ->freeze_noirq()
+ callback provided by the device's driver and return its result, or return
+ 0 if not defined
+
int pm_generic_thaw(struct device *dev);
- if the device has not been suspended at run time, invoke the ->thaw()
callback provided by its driver and return its result, or return 0 if not
defined
+ int pm_generic_thaw_noirq(struct device *dev);
+ - if pm_runtime_suspended(dev) returns "false", invoke the ->thaw_noirq()
+ callback provided by the device's driver and return its result, or return
+ 0 if not defined
+
int pm_generic_poweroff(struct device *dev);
- if the device has not been suspended at run time, invoke the ->poweroff()
callback provided by its driver and return its result, or return 0 if not
defined
+ int pm_generic_poweroff_noirq(struct device *dev);
+ - if pm_runtime_suspended(dev) returns "false", run the ->poweroff_noirq()
+ callback provided by the device's driver and return its result, or return
+ 0 if not defined
+
int pm_generic_restore(struct device *dev);
- invoke the ->restore() callback provided by the driver of this device and,
if successful, change the device's runtime PM status to 'active'
+ int pm_generic_restore_noirq(struct device *dev);
+ - invoke the ->restore_noirq() callback provided by the device's driver
+
These functions can be assigned to the ->runtime_idle(), ->runtime_suspend(),
-->runtime_resume(), ->suspend(), ->resume(), ->freeze(), ->thaw(), ->poweroff(),
-or ->restore() callback pointers in the subsystem-level dev_pm_ops structures.
+->runtime_resume(), ->suspend(), ->suspend_noirq(), ->resume(),
+->resume_noirq(), ->freeze(), ->freeze_noirq(), ->thaw(), ->thaw_noirq(),
+->poweroff(), ->poweroff_noirq(), ->restore(), ->restore_noirq() callback
+pointers in the subsystem-level dev_pm_ops structures.
If a subsystem wishes to use all of them at the same time, it can simply assign
the GENERIC_SUBSYS_PM_OPS macro, defined in include/linux/pm.h, to its
dev_pm_ops structure pointer.
Device drivers that wish to use the same function as a system suspend, freeze,
-poweroff and run-time suspend callback, and similarly for system resume, thaw,
-restore, and run-time resume, can achieve this with the help of the
+poweroff and runtime suspend callback, and similarly for system resume, thaw,
+restore, and runtime resume, can achieve this with the help of the
UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its
last argument to NULL).
@@ -649,7 +724,7 @@ Some "devices" are only logical sub-devices of their parent and cannot be
power-managed on their own. (The prototype example is a USB interface. Entire
USB devices can go into low-power mode or send wake-up requests, but neither is
possible for individual interfaces.) The drivers for these devices have no
-need of run-time PM callbacks; if the callbacks did exist, ->runtime_suspend()
+need of runtime PM callbacks; if the callbacks did exist, ->runtime_suspend()
and ->runtime_resume() would always return 0 without doing anything else and
->runtime_idle() would always call pm_runtime_suspend().
@@ -657,7 +732,7 @@ Subsystems can tell the PM core about these devices by calling
pm_runtime_no_callbacks(). This should be done after the device structure is
initialized and before it is registered (although after device registration is
also okay). The routine will set the device's power.no_callbacks flag and
-prevent the non-debugging run-time PM sysfs attributes from being created.
+prevent the non-debugging runtime PM sysfs attributes from being created.
When power.no_callbacks is set, the PM core will not invoke the
->runtime_idle(), ->runtime_suspend(), or ->runtime_resume() callbacks.
@@ -665,7 +740,7 @@ Instead it will assume that suspends and resumes always succeed and that idle
devices should be suspended.
As a consequence, the PM core will never directly inform the device's subsystem
-or driver about run-time power changes. Instead, the driver for the device's
+or driver about runtime power changes. Instead, the driver for the device's
parent must take responsibility for telling the device's driver when the
parent's power state changes.
@@ -676,13 +751,13 @@ A device should be put in a low-power state only when there's some reason to
think it will remain in that state for a substantial time. A common heuristic
says that a device which hasn't been used for a while is liable to remain
unused; following this advice, drivers should not allow devices to be suspended
-at run-time until they have been inactive for some minimum period. Even when
+at runtime until they have been inactive for some minimum period. Even when
the heuristic ends up being non-optimal, it will still prevent devices from
"bouncing" too rapidly between low-power and full-power states.
The term "autosuspend" is an historical remnant. It doesn't mean that the
device is automatically suspended (the subsystem or driver still has to call
-the appropriate PM routines); rather it means that run-time suspends will
+the appropriate PM routines); rather it means that runtime suspends will
automatically be delayed until the desired period of inactivity has elapsed.
Inactivity is determined based on the power.last_busy field. Drivers should
diff --git a/Documentation/spi/ep93xx_spi b/Documentation/spi/ep93xx_spi
index 6325f5b48635..d8eb01c15db1 100644
--- a/Documentation/spi/ep93xx_spi
+++ b/Documentation/spi/ep93xx_spi
@@ -88,6 +88,16 @@ static void __init ts72xx_init_machine(void)
ARRAY_SIZE(ts72xx_spi_devices));
}
+The driver can use DMA for the transfers also. In this case ts72xx_spi_info
+becomes:
+
+static struct ep93xx_spi_info ts72xx_spi_info = {
+ .num_chipselect = ARRAY_SIZE(ts72xx_spi_devices),
+ .use_dma = true;
+};
+
+Note that CONFIG_EP93XX_DMA should be enabled as well.
+
Thanks to
=========
Martin Guy, H. Hartley Sweeten and others who helped me during development of
diff --git a/Documentation/spi/pxa2xx b/Documentation/spi/pxa2xx
index 493dada57372..00511e08db78 100644
--- a/Documentation/spi/pxa2xx
+++ b/Documentation/spi/pxa2xx
@@ -22,15 +22,11 @@ Typically a SPI master is defined in the arch/.../mach-*/board-*.c as a
found in include/linux/spi/pxa2xx_spi.h:
struct pxa2xx_spi_master {
- enum pxa_ssp_type ssp_type;
u32 clock_enable;
u16 num_chipselect;
u8 enable_dma;
};
-The "pxa2xx_spi_master.ssp_type" field must have a value between 1 and 3 and
-informs the driver which features a particular SSP supports.
-
The "pxa2xx_spi_master.clock_enable" field is used to enable/disable the
corresponding SSP peripheral block in the "Clock Enable Register (CKEN"). See
the "PXA2xx Developer Manual" section "Clocks and Power Management".
@@ -61,7 +57,6 @@ static struct resource pxa_spi_nssp_resources[] = {
};
static struct pxa2xx_spi_master pxa_nssp_master_info = {
- .ssp_type = PXA25x_NSSP, /* Type of SSP */
.clock_enable = CKEN_NSSP, /* NSSP Peripheral clock */
.num_chipselect = 1, /* Matches the number of chips attached to NSSP */
.enable_dma = 1, /* Enables NSSP DMA */
diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index c83bd6b4e6e8..d0d0bb9e3e25 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -22,14 +22,15 @@ current_tracer. Instead of that, add probe points via
Synopsis of kprobe_events
-------------------------
- p[:[GRP/]EVENT] SYMBOL[+offs]|MEMADDR [FETCHARGS] : Set a probe
- r[:[GRP/]EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
+ p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
+ r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
-:[GRP/]EVENT : Clear a probe
GRP : Group name. If omitted, use "kprobes" for it.
EVENT : Event name. If omitted, the event name is generated
- based on SYMBOL+offs or MEMADDR.
- SYMBOL[+offs] : Symbol+offset where the probe is inserted.
+ based on SYM+offs or MEMADDR.
+ MOD : Module name which has given SYM.
+ SYM[+offs] : Symbol+offset where the probe is inserted.
MEMADDR : Address where the probe is inserted.
FETCHARGS : Arguments. Each probe can have up to 128 args.
diff --git a/Documentation/vDSO/parse_vdso.c b/Documentation/vDSO/parse_vdso.c
new file mode 100644
index 000000000000..85870208edcf
--- /dev/null
+++ b/Documentation/vDSO/parse_vdso.c
@@ -0,0 +1,256 @@
+/*
+ * parse_vdso.c: Linux reference vDSO parser
+ * Written by Andrew Lutomirski, 2011.
+ *
+ * This code is meant to be linked in to various programs that run on Linux.
+ * As such, it is available with as few restrictions as possible. This file
+ * is licensed under the Creative Commons Zero License, version 1.0,
+ * available at http://creativecommons.org/publicdomain/zero/1.0/legalcode
+ *
+ * The vDSO is a regular ELF DSO that the kernel maps into user space when
+ * it starts a program. It works equally well in statically and dynamically
+ * linked binaries.
+ *
+ * This code is tested on x86_64. In principle it should work on any 64-bit
+ * architecture that has a vDSO.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+#include <string.h>
+#include <elf.h>
+
+/*
+ * To use this vDSO parser, first call one of the vdso_init_* functions.
+ * If you've already parsed auxv, then pass the value of AT_SYSINFO_EHDR
+ * to vdso_init_from_sysinfo_ehdr. Otherwise pass auxv to vdso_init_from_auxv.
+ * Then call vdso_sym for each symbol you want. For example, to look up
+ * gettimeofday on x86_64, use:
+ *
+ * <some pointer> = vdso_sym("LINUX_2.6", "gettimeofday");
+ * or
+ * <some pointer> = vdso_sym("LINUX_2.6", "__vdso_gettimeofday");
+ *
+ * vdso_sym will return 0 if the symbol doesn't exist or if the init function
+ * failed or was not called. vdso_sym is a little slow, so its return value
+ * should be cached.
+ *
+ * vdso_sym is threadsafe; the init functions are not.
+ *
+ * These are the prototypes:
+ */
+extern void vdso_init_from_auxv(void *auxv);
+extern void vdso_init_from_sysinfo_ehdr(uintptr_t base);
+extern void *vdso_sym(const char *version, const char *name);
+
+
+/* And here's the code. */
+
+#ifndef __x86_64__
+# error Not yet ported to non-x86_64 architectures
+#endif
+
+static struct vdso_info
+{
+ bool valid;
+
+ /* Load information */
+ uintptr_t load_addr;
+ uintptr_t load_offset; /* load_addr - recorded vaddr */
+
+ /* Symbol table */
+ Elf64_Sym *symtab;
+ const char *symstrings;
+ Elf64_Word *bucket, *chain;
+ Elf64_Word nbucket, nchain;
+
+ /* Version table */
+ Elf64_Versym *versym;
+ Elf64_Verdef *verdef;
+} vdso_info;
+
+/* Straight from the ELF specification. */
+static unsigned long elf_hash(const unsigned char *name)
+{
+ unsigned long h = 0, g;
+ while (*name)
+ {
+ h = (h << 4) + *name++;
+ if (g = h & 0xf0000000)
+ h ^= g >> 24;
+ h &= ~g;
+ }
+ return h;
+}
+
+void vdso_init_from_sysinfo_ehdr(uintptr_t base)
+{
+ size_t i;
+ bool found_vaddr = false;
+
+ vdso_info.valid = false;
+
+ vdso_info.load_addr = base;
+
+ Elf64_Ehdr *hdr = (Elf64_Ehdr*)base;
+ Elf64_Phdr *pt = (Elf64_Phdr*)(vdso_info.load_addr + hdr->e_phoff);
+ Elf64_Dyn *dyn = 0;
+
+ /*
+ * We need two things from the segment table: the load offset
+ * and the dynamic table.
+ */
+ for (i = 0; i < hdr->e_phnum; i++)
+ {
+ if (pt[i].p_type == PT_LOAD && !found_vaddr) {
+ found_vaddr = true;
+ vdso_info.load_offset = base
+ + (uintptr_t)pt[i].p_offset
+ - (uintptr_t)pt[i].p_vaddr;
+ } else if (pt[i].p_type == PT_DYNAMIC) {
+ dyn = (Elf64_Dyn*)(base + pt[i].p_offset);
+ }
+ }
+
+ if (!found_vaddr || !dyn)
+ return; /* Failed */
+
+ /*
+ * Fish out the useful bits of the dynamic table.
+ */
+ Elf64_Word *hash = 0;
+ vdso_info.symstrings = 0;
+ vdso_info.symtab = 0;
+ vdso_info.versym = 0;
+ vdso_info.verdef = 0;
+ for (i = 0; dyn[i].d_tag != DT_NULL; i++) {
+ switch (dyn[i].d_tag) {
+ case DT_STRTAB:
+ vdso_info.symstrings = (const char *)
+ ((uintptr_t)dyn[i].d_un.d_ptr
+ + vdso_info.load_offset);
+ break;
+ case DT_SYMTAB:
+ vdso_info.symtab = (Elf64_Sym *)
+ ((uintptr_t)dyn[i].d_un.d_ptr
+ + vdso_info.load_offset);
+ break;
+ case DT_HASH:
+ hash = (Elf64_Word *)
+ ((uintptr_t)dyn[i].d_un.d_ptr
+ + vdso_info.load_offset);
+ break;
+ case DT_VERSYM:
+ vdso_info.versym = (Elf64_Versym *)
+ ((uintptr_t)dyn[i].d_un.d_ptr
+ + vdso_info.load_offset);
+ break;
+ case DT_VERDEF:
+ vdso_info.verdef = (Elf64_Verdef *)
+ ((uintptr_t)dyn[i].d_un.d_ptr
+ + vdso_info.load_offset);
+ break;
+ }
+ }
+ if (!vdso_info.symstrings || !vdso_info.symtab || !hash)
+ return; /* Failed */
+
+ if (!vdso_info.verdef)
+ vdso_info.versym = 0;
+
+ /* Parse the hash table header. */
+ vdso_info.nbucket = hash[0];
+ vdso_info.nchain = hash[1];
+ vdso_info.bucket = &hash[2];
+ vdso_info.chain = &hash[vdso_info.nbucket + 2];
+
+ /* That's all we need. */
+ vdso_info.valid = true;
+}
+
+static bool vdso_match_version(Elf64_Versym ver,
+ const char *name, Elf64_Word hash)
+{
+ /*
+ * This is a helper function to check if the version indexed by
+ * ver matches name (which hashes to hash).
+ *
+ * The version definition table is a mess, and I don't know how
+ * to do this in better than linear time without allocating memory
+ * to build an index. I also don't know why the table has
+ * variable size entries in the first place.
+ *
+ * For added fun, I can't find a comprehensible specification of how
+ * to parse all the weird flags in the table.
+ *
+ * So I just parse the whole table every time.
+ */
+
+ /* First step: find the version definition */
+ ver &= 0x7fff; /* Apparently bit 15 means "hidden" */
+ Elf64_Verdef *def = vdso_info.verdef;
+ while(true) {
+ if ((def->vd_flags & VER_FLG_BASE) == 0
+ && (def->vd_ndx & 0x7fff) == ver)
+ break;
+
+ if (def->vd_next == 0)
+ return false; /* No definition. */
+
+ def = (Elf64_Verdef *)((char *)def + def->vd_next);
+ }
+
+ /* Now figure out whether it matches. */
+ Elf64_Verdaux *aux = (Elf64_Verdaux*)((char *)def + def->vd_aux);
+ return def->vd_hash == hash
+ && !strcmp(name, vdso_info.symstrings + aux->vda_name);
+}
+
+void *vdso_sym(const char *version, const char *name)
+{
+ unsigned long ver_hash;
+ if (!vdso_info.valid)
+ return 0;
+
+ ver_hash = elf_hash(version);
+ Elf64_Word chain = vdso_info.bucket[elf_hash(name) % vdso_info.nbucket];
+
+ for (; chain != STN_UNDEF; chain = vdso_info.chain[chain]) {
+ Elf64_Sym *sym = &vdso_info.symtab[chain];
+
+ /* Check for a defined global or weak function w/ right name. */
+ if (ELF64_ST_TYPE(sym->st_info) != STT_FUNC)
+ continue;
+ if (ELF64_ST_BIND(sym->st_info) != STB_GLOBAL &&
+ ELF64_ST_BIND(sym->st_info) != STB_WEAK)
+ continue;
+ if (sym->st_shndx == SHN_UNDEF)
+ continue;
+ if (strcmp(name, vdso_info.symstrings + sym->st_name))
+ continue;
+
+ /* Check symbol version. */
+ if (vdso_info.versym
+ && !vdso_match_version(vdso_info.versym[chain],
+ version, ver_hash))
+ continue;
+
+ return (void *)(vdso_info.load_offset + sym->st_value);
+ }
+
+ return 0;
+}
+
+void vdso_init_from_auxv(void *auxv)
+{
+ Elf64_auxv_t *elf_auxv = auxv;
+ for (int i = 0; elf_auxv[i].a_type != AT_NULL; i++)
+ {
+ if (elf_auxv[i].a_type == AT_SYSINFO_EHDR) {
+ vdso_init_from_sysinfo_ehdr(elf_auxv[i].a_un.a_val);
+ return;
+ }
+ }
+
+ vdso_info.valid = false;
+}
diff --git a/Documentation/vDSO/vdso_test.c b/Documentation/vDSO/vdso_test.c
new file mode 100644
index 000000000000..fff633432dff
--- /dev/null
+++ b/Documentation/vDSO/vdso_test.c
@@ -0,0 +1,111 @@
+/*
+ * vdso_test.c: Sample code to test parse_vdso.c on x86_64
+ * Copyright (c) 2011 Andy Lutomirski
+ * Subject to the GNU General Public License, version 2
+ *
+ * You can amuse yourself by compiling with:
+ * gcc -std=gnu99 -nostdlib
+ * -Os -fno-asynchronous-unwind-tables -flto
+ * vdso_test.c parse_vdso.c -o vdso_test
+ * to generate a small binary with no dependencies at all.
+ */
+
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <unistd.h>
+#include <stdint.h>
+
+extern void *vdso_sym(const char *version, const char *name);
+extern void vdso_init_from_sysinfo_ehdr(uintptr_t base);
+extern void vdso_init_from_auxv(void *auxv);
+
+/* We need a libc functions... */
+int strcmp(const char *a, const char *b)
+{
+ /* This implementation is buggy: it never returns -1. */
+ while (*a || *b) {
+ if (*a != *b)
+ return 1;
+ if (*a == 0 || *b == 0)
+ return 1;
+ a++;
+ b++;
+ }
+
+ return 0;
+}
+
+/* ...and two syscalls. This is x86_64-specific. */
+static inline long linux_write(int fd, const void *data, size_t len)
+{
+
+ long ret;
+ asm volatile ("syscall" : "=a" (ret) : "a" (__NR_write),
+ "D" (fd), "S" (data), "d" (len) :
+ "cc", "memory", "rcx",
+ "r8", "r9", "r10", "r11" );
+ return ret;
+}
+
+static inline void linux_exit(int code)
+{
+ asm volatile ("syscall" : : "a" (__NR_exit), "D" (code));
+}
+
+void to_base10(char *lastdig, uint64_t n)
+{
+ while (n) {
+ *lastdig = (n % 10) + '0';
+ n /= 10;
+ lastdig--;
+ }
+}
+
+__attribute__((externally_visible)) void c_main(void **stack)
+{
+ /* Parse the stack */
+ long argc = (long)*stack;
+ stack += argc + 2;
+
+ /* Now we're pointing at the environment. Skip it. */
+ while(*stack)
+ stack++;
+ stack++;
+
+ /* Now we're pointing at auxv. Initialize the vDSO parser. */
+ vdso_init_from_auxv((void *)stack);
+
+ /* Find gettimeofday. */
+ typedef long (*gtod_t)(struct timeval *tv, struct timezone *tz);
+ gtod_t gtod = (gtod_t)vdso_sym("LINUX_2.6", "__vdso_gettimeofday");
+
+ if (!gtod)
+ linux_exit(1);
+
+ struct timeval tv;
+ long ret = gtod(&tv, 0);
+
+ if (ret == 0) {
+ char buf[] = "The time is .000000\n";
+ to_base10(buf + 31, tv.tv_sec);
+ to_base10(buf + 38, tv.tv_usec);
+ linux_write(1, buf, sizeof(buf) - 1);
+ } else {
+ linux_exit(ret);
+ }
+
+ linux_exit(0);
+}
+
+/*
+ * This is the real entry point. It passes the initial stack into
+ * the C entry point.
+ */
+asm (
+ ".text\n"
+ ".global _start\n"
+ ".type _start,@function\n"
+ "_start:\n\t"
+ "mov %rsp,%rdi\n\t"
+ "jmp c_main"
+ );
diff --git a/Documentation/virtual/lguest/lguest.c b/Documentation/virtual/lguest/lguest.c
index cd9d6af61d07..043bd7df3139 100644
--- a/Documentation/virtual/lguest/lguest.c
+++ b/Documentation/virtual/lguest/lguest.c
@@ -51,7 +51,7 @@
#include <asm/bootparam.h>
#include "../../../include/linux/lguest_launcher.h"
/*L:110
- * We can ignore the 42 include files we need for this program, but I do want
+ * We can ignore the 43 include files we need for this program, but I do want
* to draw attention to the use of kernel-style types.
*
* As Linus said, "C is a Spartan language, and so should your naming be." I
@@ -65,7 +65,6 @@ typedef uint16_t u16;
typedef uint8_t u8;
/*:*/
-#define PAGE_PRESENT 0x7 /* Present, RW, Execute */
#define BRIDGE_PFX "bridge:"
#ifndef SIOCBRADDIF
#define SIOCBRADDIF 0x89a2 /* add interface to bridge */
@@ -861,8 +860,10 @@ static void console_output(struct virtqueue *vq)
/* writev can return a partial write, so we loop here. */
while (!iov_empty(iov, out)) {
int len = writev(STDOUT_FILENO, iov, out);
- if (len <= 0)
- err(1, "Write to stdout gave %i", len);
+ if (len <= 0) {
+ warn("Write to stdout gave %i (%d)", len, errno);
+ break;
+ }
iov_consume(iov, out, len);
}
@@ -898,7 +899,7 @@ static void net_output(struct virtqueue *vq)
* same format: what a coincidence!
*/
if (writev(net_info->tunfd, iov, out) < 0)
- errx(1, "Write to tun failed?");
+ warnx("Write to tun failed (%d)?", errno);
/*
* Done with that one; wait_for_vq_desc() will send the interrupt if
@@ -955,7 +956,7 @@ static void net_input(struct virtqueue *vq)
*/
len = readv(net_info->tunfd, iov, in);
if (len <= 0)
- err(1, "Failed to read from tun.");
+ warn("Failed to read from tun (%d).", errno);
/*
* Mark that packet buffer as used, but don't interrupt here. We want
@@ -1093,9 +1094,10 @@ static void update_device_status(struct device *dev)
warnx("Device %s configuration FAILED", dev->name);
if (dev->running)
reset_device(dev);
- } else if (dev->desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
- if (!dev->running)
- start_device(dev);
+ } else {
+ if (dev->running)
+ err(1, "Device %s features finalized twice", dev->name);
+ start_device(dev);
}
}
@@ -1120,25 +1122,11 @@ static void handle_output(unsigned long addr)
return;
}
- /*
- * Devices *can* be used before status is set to DRIVER_OK.
- * The original plan was that they would never do this: they
- * would always finish setting up their status bits before
- * actually touching the virtqueues. In practice, we allowed
- * them to, and they do (eg. the disk probes for partition
- * tables as part of initialization).
- *
- * If we see this, we start the device: once it's running, we
- * expect the device to catch all the notifications.
- */
+ /* Devices should not be used before features are finalized. */
for (vq = i->vq; vq; vq = vq->next) {
if (addr != vq->config.pfn*getpagesize())
continue;
- if (i->running)
- errx(1, "Notification on running %s", i->name);
- /* This just calls create_thread() for each virtqueue */
- start_device(i);
- return;
+ errx(1, "Notification on %s before setup!", i->name);
}
}
@@ -1370,7 +1358,7 @@ static void setup_console(void)
* --sharenet=<name> option which opens or creates a named pipe. This can be
* used to send packets to another guest in a 1:1 manner.
*
- * More sopisticated is to use one of the tools developed for project like UML
+ * More sophisticated is to use one of the tools developed for project like UML
* to do networking.
*
* Faster is to do virtio bonding in kernel. Doing this 1:1 would be
@@ -1380,7 +1368,7 @@ static void setup_console(void)
* multiple inter-guest channels behind one interface, although it would
* require some manner of hotplugging new virtio channels.
*
- * Finally, we could implement a virtio network switch in the kernel.
+ * Finally, we could use a virtio network switch in the kernel, ie. vhost.
:*/
static u32 str2ip(const char *ipaddr)
@@ -2017,10 +2005,7 @@ int main(int argc, char *argv[])
/* Tell the entry path not to try to reload segment registers. */
boot->hdr.loadflags |= KEEP_SEGMENTS;
- /*
- * We tell the kernel to initialize the Guest: this returns the open
- * /dev/lguest file descriptor.
- */
+ /* We tell the kernel to initialize the Guest. */
tell_kernel(start);
/* Ensure that we terminate if a device-servicing child dies. */
diff --git a/Documentation/x86/entry_64.txt b/Documentation/x86/entry_64.txt
new file mode 100644
index 000000000000..7869f14d055c
--- /dev/null
+++ b/Documentation/x86/entry_64.txt
@@ -0,0 +1,98 @@
+This file documents some of the kernel entries in
+arch/x86/kernel/entry_64.S. A lot of this explanation is adapted from
+an email from Ingo Molnar:
+
+http://lkml.kernel.org/r/<20110529191055.GC9835%40elte.hu>
+
+The x86 architecture has quite a few different ways to jump into
+kernel code. Most of these entry points are registered in
+arch/x86/kernel/traps.c and implemented in arch/x86/kernel/entry_64.S
+and arch/x86/ia32/ia32entry.S.
+
+The IDT vector assignments are listed in arch/x86/include/irq_vectors.h.
+
+Some of these entries are:
+
+ - system_call: syscall instruction from 64-bit code.
+
+ - ia32_syscall: int 0x80 from 32-bit or 64-bit code; compat syscall
+ either way.
+
+ - ia32_syscall, ia32_sysenter: syscall and sysenter from 32-bit
+ code
+
+ - interrupt: An array of entries. Every IDT vector that doesn't
+ explicitly point somewhere else gets set to the corresponding
+ value in interrupts. These point to a whole array of
+ magically-generated functions that make their way to do_IRQ with
+ the interrupt number as a parameter.
+
+ - emulate_vsyscall: int 0xcc, a special non-ABI entry used by
+ vsyscall emulation.
+
+ - APIC interrupts: Various special-purpose interrupts for things
+ like TLB shootdown.
+
+ - Architecturally-defined exceptions like divide_error.
+
+There are a few complexities here. The different x86-64 entries
+have different calling conventions. The syscall and sysenter
+instructions have their own peculiar calling conventions. Some of
+the IDT entries push an error code onto the stack; others don't.
+IDT entries using the IST alternative stack mechanism need their own
+magic to get the stack frames right. (You can find some
+documentation in the AMD APM, Volume 2, Chapter 8 and the Intel SDM,
+Volume 3, Chapter 6.)
+
+Dealing with the swapgs instruction is especially tricky. Swapgs
+toggles whether gs is the kernel gs or the user gs. The swapgs
+instruction is rather fragile: it must nest perfectly and only in
+single depth, it should only be used if entering from user mode to
+kernel mode and then when returning to user-space, and precisely
+so. If we mess that up even slightly, we crash.
+
+So when we have a secondary entry, already in kernel mode, we *must
+not* use SWAPGS blindly - nor must we forget doing a SWAPGS when it's
+not switched/swapped yet.
+
+Now, there's a secondary complication: there's a cheap way to test
+which mode the CPU is in and an expensive way.
+
+The cheap way is to pick this info off the entry frame on the kernel
+stack, from the CS of the ptregs area of the kernel stack:
+
+ xorl %ebx,%ebx
+ testl $3,CS+8(%rsp)
+ je error_kernelspace
+ SWAPGS
+
+The expensive (paranoid) way is to read back the MSR_GS_BASE value
+(which is what SWAPGS modifies):
+
+ movl $1,%ebx
+ movl $MSR_GS_BASE,%ecx
+ rdmsr
+ testl %edx,%edx
+ js 1f /* negative -> in kernel */
+ SWAPGS
+ xorl %ebx,%ebx
+1: ret
+
+and the whole paranoid non-paranoid macro complexity is about whether
+to suffer that RDMSR cost.
+
+If we are at an interrupt or user-trap/gate-alike boundary then we can
+use the faster check: the stack will be a reliable indicator of
+whether SWAPGS was already done: if we see that we are a secondary
+entry interrupting kernel mode execution, then we know that the GS
+base has already been switched. If it says that we interrupted
+user-space execution then we must do the SWAPGS.
+
+But if we are in an NMI/MCE/DEBUG/whatever super-atomic entry context,
+which might have triggered right after a normal entry wrote CS to the
+stack but before we executed SWAPGS, then the only safe way to check
+for GS is the slower method: the RDMSR.
+
+So we try only to mark those entry methods 'paranoid' that absolutely
+need the more expensive check for the GS base - and we generate all
+'normal' entry points with the regular (faster) entry macros.
diff --git a/Documentation/zh_CN/SubmitChecklist b/Documentation/zh_CN/SubmitChecklist
index 951415bbab0c..4c741d6bc048 100644
--- a/Documentation/zh_CN/SubmitChecklist
+++ b/Documentation/zh_CN/SubmitChecklist
@@ -67,7 +67,7 @@ Linux内核提交清单
12:已经通过CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
- CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP测试,并且同时都
+ CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_ATOMIC_SLEEP测试,并且同时都
使能。
13:已经都构建并且使用或者不使用 CONFIG_SMP 和 CONFIG_PREEMPT测试执行时间。