diff options
165 files changed, 5349 insertions, 1121 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 4fb76c0e8d30..1528239f69b2 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -484,6 +484,7 @@ What: /sys/devices/system/cpu/vulnerabilities /sys/devices/system/cpu/vulnerabilities/spectre_v2 /sys/devices/system/cpu/vulnerabilities/spec_store_bypass /sys/devices/system/cpu/vulnerabilities/l1tf + /sys/devices/system/cpu/vulnerabilities/mds Date: January 2018 Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> Description: Information about CPU vulnerabilities @@ -496,8 +497,7 @@ Description: Information about CPU vulnerabilities "Vulnerable" CPU is affected and no mitigation in effect "Mitigation: $M" CPU is affected and mitigation $M is in effect - Details about the l1tf file can be found in - Documentation/admin-guide/l1tf.rst + See also: Documentation/admin-guide/hw-vuln/index.rst What: /sys/devices/system/cpu/smt /sys/devices/system/cpu/smt/active diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst new file mode 100644 index 000000000000..ffc064c1ec68 --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/index.rst @@ -0,0 +1,13 @@ +======================== +Hardware vulnerabilities +======================== + +This section describes CPU vulnerabilities and provides an overview of the +possible mitigations along with guidance for selecting mitigations if they +are configurable at compile, boot or run time. + +.. toctree:: + :maxdepth: 1 + + l1tf + mds diff --git a/Documentation/admin-guide/l1tf.rst b/Documentation/admin-guide/hw-vuln/l1tf.rst index 9af977384168..31653a9f0e1b 100644 --- a/Documentation/admin-guide/l1tf.rst +++ b/Documentation/admin-guide/hw-vuln/l1tf.rst @@ -445,6 +445,7 @@ The default is 'cond'. If 'l1tf=full,force' is given on the kernel command line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush module parameter is ignored and writes to the sysfs file are rejected. +.. _mitigation_selection: Mitigation selection guide -------------------------- diff --git a/Documentation/admin-guide/hw-vuln/mds.rst b/Documentation/admin-guide/hw-vuln/mds.rst new file mode 100644 index 000000000000..e3a796c0d3a2 --- /dev/null +++ b/Documentation/admin-guide/hw-vuln/mds.rst @@ -0,0 +1,308 @@ +MDS - Microarchitectural Data Sampling +====================================== + +Microarchitectural Data Sampling is a hardware vulnerability which allows +unprivileged speculative access to data which is available in various CPU +internal buffers. + +Affected processors +------------------- + +This vulnerability affects a wide range of Intel processors. The +vulnerability is not present on: + + - Processors from AMD, Centaur and other non Intel vendors + + - Older processor models, where the CPU family is < 6 + + - Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus) + + - Intel processors which have the ARCH_CAP_MDS_NO bit set in the + IA32_ARCH_CAPABILITIES MSR. + +Whether a processor is affected or not can be read out from the MDS +vulnerability file in sysfs. See :ref:`mds_sys_info`. + +Not all processors are affected by all variants of MDS, but the mitigation +is identical for all of them so the kernel treats them as a single +vulnerability. + +Related CVEs +------------ + +The following CVE entries are related to the MDS vulnerability: + + ============== ===== =================================================== + CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling + CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling + CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling + CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory + ============== ===== =================================================== + +Problem +------- + +When performing store, load, L1 refill operations, processors write data +into temporary microarchitectural structures (buffers). The data in the +buffer can be forwarded to load operations as an optimization. + +Under certain conditions, usually a fault/assist caused by a load +operation, data unrelated to the load memory address can be speculatively +forwarded from the buffers. Because the load operation causes a fault or +assist and its result will be discarded, the forwarded data will not cause +incorrect program execution or state changes. But a malicious operation +may be able to forward this speculative data to a disclosure gadget which +allows in turn to infer the value via a cache side channel attack. + +Because the buffers are potentially shared between Hyper-Threads cross +Hyper-Thread attacks are possible. + +Deeper technical information is available in the MDS specific x86 +architecture section: :ref:`Documentation/x86/mds.rst <mds>`. + + +Attack scenarios +---------------- + +Attacks against the MDS vulnerabilities can be mounted from malicious non +priviledged user space applications running on hosts or guest. Malicious +guest OSes can obviously mount attacks as well. + +Contrary to other speculation based vulnerabilities the MDS vulnerability +does not allow the attacker to control the memory target address. As a +consequence the attacks are purely sampling based, but as demonstrated with +the TLBleed attack samples can be postprocessed successfully. + +Web-Browsers +^^^^^^^^^^^^ + + It's unclear whether attacks through Web-Browsers are possible at + all. The exploitation through Java-Script is considered very unlikely, + but other widely used web technologies like Webassembly could possibly be + abused. + + +.. _mds_sys_info: + +MDS system information +----------------------- + +The Linux kernel provides a sysfs interface to enumerate the current MDS +status of the system: whether the system is vulnerable, and which +mitigations are active. The relevant sysfs file is: + +/sys/devices/system/cpu/vulnerabilities/mds + +The possible values in this file are: + + .. list-table:: + + * - 'Not affected' + - The processor is not vulnerable + * - 'Vulnerable' + - The processor is vulnerable, but no mitigation enabled + * - 'Vulnerable: Clear CPU buffers attempted, no microcode' + - The processor is vulnerable but microcode is not updated. + + The mitigation is enabled on a best effort basis. See :ref:`vmwerv` + * - 'Mitigation: Clear CPU buffers' + - The processor is vulnerable and the CPU buffer clearing mitigation is + enabled. + +If the processor is vulnerable then the following information is appended +to the above information: + + ======================== ============================================ + 'SMT vulnerable' SMT is enabled + 'SMT mitigated' SMT is enabled and mitigated + 'SMT disabled' SMT is disabled + 'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown + ======================== ============================================ + +.. _vmwerv: + +Best effort mitigation mode +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + If the processor is vulnerable, but the availability of the microcode based + mitigation mechanism is not advertised via CPUID the kernel selects a best + effort mitigation mode. This mode invokes the mitigation instructions + without a guarantee that they clear the CPU buffers. + + This is done to address virtualization scenarios where the host has the + microcode update applied, but the hypervisor is not yet updated to expose + the CPUID to the guest. If the host has updated microcode the protection + takes effect otherwise a few cpu cycles are wasted pointlessly. + + The state in the mds sysfs file reflects this situation accordingly. + + +Mitigation mechanism +------------------------- + +The kernel detects the affected CPUs and the presence of the microcode +which is required. + +If a CPU is affected and the microcode is available, then the kernel +enables the mitigation by default. The mitigation can be controlled at boot +time via a kernel command line option. See +:ref:`mds_mitigation_control_command_line`. + +.. _cpu_buffer_clear: + +CPU buffer clearing +^^^^^^^^^^^^^^^^^^^ + + The mitigation for MDS clears the affected CPU buffers on return to user + space and when entering a guest. + + If SMT is enabled it also clears the buffers on idle entry when the CPU + is only affected by MSBDS and not any other MDS variant, because the + other variants cannot be protected against cross Hyper-Thread attacks. + + For CPUs which are only affected by MSBDS the user space, guest and idle + transition mitigations are sufficient and SMT is not affected. + +.. _virt_mechanism: + +Virtualization mitigation +^^^^^^^^^^^^^^^^^^^^^^^^^ + + The protection for host to guest transition depends on the L1TF + vulnerability of the CPU: + + - CPU is affected by L1TF: + + If the L1D flush mitigation is enabled and up to date microcode is + available, the L1D flush mitigation is automatically protecting the + guest transition. + + If the L1D flush mitigation is disabled then the MDS mitigation is + invoked explicit when the host MDS mitigation is enabled. + + For details on L1TF and virtualization see: + :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`. + + - CPU is not affected by L1TF: + + CPU buffers are flushed before entering the guest when the host MDS + mitigation is enabled. + + The resulting MDS protection matrix for the host to guest transition: + + ============ ===== ============= ============ ================= + L1TF MDS VMX-L1FLUSH Host MDS MDS-State + + Don't care No Don't care N/A Not affected + + Yes Yes Disabled Off Vulnerable + + Yes Yes Disabled Full Mitigated + + Yes Yes Enabled Don't care Mitigated + + No Yes N/A Off Vulnerable + + No Yes N/A Full Mitigated + ============ ===== ============= ============ ================= + + This only covers the host to guest transition, i.e. prevents leakage from + host to guest, but does not protect the guest internally. Guests need to + have their own protections. + +.. _xeon_phi: + +XEON PHI specific considerations +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The XEON PHI processor family is affected by MSBDS which can be exploited + cross Hyper-Threads when entering idle states. Some XEON PHI variants allow + to use MWAIT in user space (Ring 3) which opens an potential attack vector + for malicious user space. The exposure can be disabled on the kernel + command line with the 'ring3mwait=disable' command line option. + + XEON PHI is not affected by the other MDS variants and MSBDS is mitigated + before the CPU enters a idle state. As XEON PHI is not affected by L1TF + either disabling SMT is not required for full protection. + +.. _mds_smt_control: + +SMT control +^^^^^^^^^^^ + + All MDS variants except MSBDS can be attacked cross Hyper-Threads. That + means on CPUs which are affected by MFBDS or MLPDS it is necessary to + disable SMT for full protection. These are most of the affected CPUs; the + exception is XEON PHI, see :ref:`xeon_phi`. + + Disabling SMT can have a significant performance impact, but the impact + depends on the type of workloads. + + See the relevant chapter in the L1TF mitigation documentation for details: + :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`. + + +.. _mds_mitigation_control_command_line: + +Mitigation control on the kernel command line +--------------------------------------------- + +The kernel command line allows to control the MDS mitigations at boot +time with the option "mds=". The valid arguments for this option are: + + ============ ============================================================= + full If the CPU is vulnerable, enable all available mitigations + for the MDS vulnerability, CPU buffer clearing on exit to + userspace and when entering a VM. Idle transitions are + protected as well if SMT is enabled. + + It does not automatically disable SMT. + + full,nosmt The same as mds=full, with SMT disabled on vulnerable + CPUs. This is the complete mitigation. + + off Disables MDS mitigations completely. + + ============ ============================================================= + +Not specifying this option is equivalent to "mds=full". + + +Mitigation selection guide +-------------------------- + +1. Trusted userspace +^^^^^^^^^^^^^^^^^^^^ + + If all userspace applications are from a trusted source and do not + execute untrusted code which is supplied externally, then the mitigation + can be disabled. + + +2. Virtualization with trusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The same considerations as above versus trusted user space apply. + +3. Virtualization with untrusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The protection depends on the state of the L1TF mitigations. + See :ref:`virt_mechanism`. + + If the MDS mitigation is enabled and SMT is disabled, guest to host and + guest to guest attacks are prevented. + +.. _mds_default_mitigations: + +Default mitigations +------------------- + + The kernel default mitigations for vulnerable processors are: + + - Enable CPU buffer clearing + + The kernel does not by default enforce the disabling of SMT, which leaves + SMT systems vulnerable when running untrusted code. The same rationale as + for L1TF applies. + See :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <default_mitigations>`. diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 5b8286fdd91b..8001917ee012 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -17,14 +17,12 @@ etc. kernel-parameters devices -This section describes CPU vulnerabilities and provides an overview of the -possible mitigations along with guidance for selecting mitigations if they -are configurable at compile, boot or run time. +This section describes CPU vulnerabilities and their mitigations. .. toctree:: :maxdepth: 1 - l1tf + hw-vuln/index Here is a set of documents aimed at users who are trying to track down problems and bugs in particular. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 08df58805703..43176340c73d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2143,7 +2143,7 @@ Default is 'flush'. - For details see: Documentation/admin-guide/l1tf.rst + For details see: Documentation/admin-guide/hw-vuln/l1tf.rst l2cr= [PPC] @@ -2389,6 +2389,32 @@ Format: <first>,<last> Specifies range of consoles to be captured by the MDA. + mds= [X86,INTEL] + Control mitigation for the Micro-architectural Data + Sampling (MDS) vulnerability. + + Certain CPUs are vulnerable to an exploit against CPU + internal buffers which can forward information to a + disclosure gadget under certain conditions. + + In vulnerable processors, the speculatively + forwarded data can be used in a cache side channel + attack, to access data to which the attacker does + not have direct access. + + This parameter controls the MDS mitigation. The + options are: + + full - Enable MDS mitigation on vulnerable CPUs + full,nosmt - Enable MDS mitigation and disable + SMT on vulnerable CPUs + off - Unconditionally disable MDS mitigation + + Not specifying this option is equivalent to + mds=full. + + For details see: Documentation/admin-guide/hw-vuln/mds.rst + mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory Amount of memory to be used when the kernel is not able to see the whole system memory or for test. @@ -2565,6 +2591,7 @@ spec_store_bypass_disable=off [X86,PPC] ssbd=force-off [ARM64] l1tf=off [X86] + mds=off [X86] auto (default) Mitigate all CPU vulnerabilities, but leave SMT @@ -2579,6 +2606,7 @@ if needed. This is for users who always want to be fully mitigated, even if it means losing SMT. Equivalent to: l1tf=flush,nosmt [X86] + mds=full,nosmt [X86] mminit_loglevel= [KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this diff --git a/Documentation/devicetree/bindings/input/gpio-vibrator.yaml b/Documentation/devicetree/bindings/input/gpio-vibrator.yaml new file mode 100644 index 000000000000..903475f52dbd --- /dev/null +++ b/Documentation/devicetree/bindings/input/gpio-vibrator.yaml @@ -0,0 +1,37 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/bindings/input/gpio-vibrator.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: GPIO vibrator + +maintainers: + - Luca Weiss <luca@z3ntu.xyz> + +description: |+ + Registers a GPIO device as vibrator, where the on/off capability is controlled by a GPIO. + +properties: + compatible: + const: gpio-vibrator + + enable-gpios: + maxItems: 1 + + vcc-supply: + description: Regulator that provides power + +required: + - compatible + - enable-gpios + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + + vibrator { + compatible = "gpio-vibrator"; + enable-gpios = <&msmgpio 86 GPIO_ACTIVE_HIGH>; + vcc-supply = <&pm8941_l18>; + }; diff --git a/Documentation/devicetree/bindings/input/lpc32xx-key.txt b/Documentation/devicetree/bindings/input/lpc32xx-key.txt index bcf62f856358..2b075a080d30 100644 --- a/Documentation/devicetree/bindings/input/lpc32xx-key.txt +++ b/Documentation/devicetree/bindings/input/lpc32xx-key.txt @@ -8,6 +8,7 @@ Required Properties: - reg: Physical base address of the controller and length of memory mapped region. - interrupts: The interrupt number to the cpu. +- clocks: phandle to clock controller plus clock-specifier pair - nxp,debounce-delay-ms: Debounce delay in ms - nxp,scan-delay-ms: Repeated scan period in ms - linux,keymap: the key-code to be reported when the key is pressed @@ -22,7 +23,9 @@ Example: key@40050000 { compatible = "nxp,lpc3220-key"; reg = <0x40050000 0x1000>; - interrupts = <54 0>; + clocks = <&clk LPC32XX_CLK_KEY>; + interrupt-parent = <&sic1>; + interrupts = <22 IRQ_TYPE_LEVEL_HIGH>; keypad,num-rows = <1>; keypad,num-columns = <1>; nxp,debounce-delay-ms = <3>; diff --git a/Documentation/devicetree/bindings/input/microchip,qt1050.txt b/Documentation/devicetree/bindings/input/microchip,qt1050.txt new file mode 100644 index 000000000000..80e75f96252b --- /dev/null +++ b/Documentation/devicetree/bindings/input/microchip,qt1050.txt @@ -0,0 +1,78 @@ +Microchip AT42QT1050 Five-channel Touch Sensor IC + +The AT42QT1050 (QT1050) is a QTouchADC sensor device. The device can sense from +one to five keys, dependent on mode. The QT1050 includes all signal processing +functions necessary to provide stable sensing under a wide variety of changing +conditions, and the outputs are fully debounced. + +The touchkey device node should be placed inside an I2C bus node. + +Required properties: +- compatible: Must be "microchip,qt1050" +- reg: The I2C address of the device +- interrupts: The sink for the touchpad's IRQ output, + see ../interrupt-controller/interrupts.txt + +Optional properties: +- wakeup-source: touch keys can be used as a wakeup source + +Each button (key) is represented as a sub-node: + +Each not specified key or key with linux,code set to KEY_RESERVED gets disabled +in HW. + +Subnode properties: +- linux,code: Keycode to emit. +- reg: The key number. Valid values: 0, 1, 2, 3, 4. + +Optional subnode-properties: + +If a optional property is missing or has a invalid value the default value is +taken. + +- microchip,pre-charge-time-ns: + Each touchpad need some time to precharge. The value depends on the mechanical + layout. + Valid value range: 0 - 637500; values must be a multiple of 2500; + default is 0. +- microchip,average-samples: + Number of data samples which are averaged for each read. + Valid values: 1, 4, 16, 64, 256, 1024, 4096, 16384; default is 1. +- microchip,average-scaling: + The scaling factor which is used to scale the average-samples. + Valid values: 1, 2, 4, 8, 16, 32, 64, 128; default is 1. +- microchip,threshold: + Number of counts to register a touch detection. + Valid value range: 0 - 255; default is 20. + +Example: +QT1050 with 3 non continuous keys, key2 and key4 are disabled. + +touchkeys@41 { + compatible = "microchip,qt1050"; + reg = <0x41>; + interrupt-parent = <&gpio0>; + interrupts = <17 IRQ_TYPE_EDGE_FALLING>; + + up@0 { + reg = <0>; + linux,code = <KEY_UP>; + microchip,average-samples = <64>; + microchip,average-scaling = <16>; + microchip,pre-charge-time-ns = <10000>; + }; + + right@1 { + reg = <1>; + linux,code = <KEY_RIGHT>; + microchip,average-samples = <64>; + microchip,average-scaling = <8>; + }; + + down@3 { + reg = <3>; + linux,code = <KEY_DOWN>; + microchip,average-samples = <256>; + microchip,average-scaling = <16>; + }; +}; diff --git a/Documentation/devicetree/bindings/input/sun4i-lradc-keys.txt b/Documentation/devicetree/bindings/input/sun4i-lradc-keys.txt index 1458c3179a63..496125c6bfb7 100644 --- a/Documentation/devicetree/bindings/input/sun4i-lradc-keys.txt +++ b/Documentation/devicetree/bindings/input/sun4i-lradc-keys.txt @@ -2,12 +2,14 @@ Allwinner sun4i low res adc attached tablet keys ------------------------------------------------ Required properties: - - compatible: "allwinner,sun4i-a10-lradc-keys" + - compatible: should be one of the following string: + "allwinner,sun4i-a10-lradc-keys" + "allwinner,sun8i-a83t-r-lradc" - reg: mmio address range of the chip - interrupts: interrupt to which the chip is connected - vref-supply: powersupply for the lradc reference voltage -Each key is represented as a sub-node of "allwinner,sun4i-a10-lradc-keys": +Each key is represented as a sub-node of the compatible mentioned above: Required subnode-properties: - label: Descriptive name of the key. diff --git a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt index 8cf0b4d38a7e..fc03ea4cf5ab 100644 --- a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt +++ b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt @@ -3,6 +3,7 @@ Device tree bindings for Goodix GT9xx series touchscreen controller Required properties: - compatible : Should be "goodix,gt1151" + or "goodix,gt5663" or "goodix,gt5688" or "goodix,gt911" or "goodix,gt9110" @@ -19,6 +20,8 @@ Optional properties: - irq-gpios : GPIO pin used for IRQ. The driver uses the interrupt gpio pin as output to reset the device. - reset-gpios : GPIO pin used for reset + - AVDD28-supply : Analog power supply regulator on AVDD28 pin + - VDDIO-supply : GPIO power supply regulator on VDDIO pin - touchscreen-inverted-x - touchscreen-inverted-y - touchscreen-size-x diff --git a/Documentation/devicetree/bindings/input/touchscreen/iqs5xx.txt b/Documentation/devicetree/bindings/input/touchscreen/iqs5xx.txt new file mode 100644 index 000000000000..efa0820e2469 --- /dev/null +++ b/Documentation/devicetree/bindings/input/touchscreen/iqs5xx.txt @@ -0,0 +1,80 @@ +Azoteq IQS550/572/525 Trackpad/Touchscreen Controller + +Required properties: + +- compatible : Must be equal to one of the following: + "azoteq,iqs550" + "azoteq,iqs572" + "azoteq,iqs525" + +- reg : I2C slave address for the device. + +- interrupts : GPIO to which the device's active-high RDY + output is connected (see [0]). + +- reset-gpios : GPIO to which the device's active-low NRST + input is connected (see [1]). + +Optional properties: + +- touchscreen-min-x : See [2]. + +- touchscreen-min-y : See [2]. + +- touchscreen-size-x : See [2]. If this property is omitted, the + maximum x-coordinate is specified by the + device's "X Resolution" register. + +- touchscreen-size-y : See [2]. If this property is omitted, the + maximum y-coordinate is specified by the + device's "Y Resolution" register. + +- touchscreen-max-pressure : See [2]. Pressure is expressed as the sum of + the deltas across all channels impacted by a + touch event. A channel's delta is calculated + as its count value minus a reference, where + the count value is inversely proportional to + the channel's capacitance. + +- touchscreen-fuzz-x : See [2]. + +- touchscreen-fuzz-y : See [2]. + +- touchscreen-fuzz-pressure : See [2]. + +- touchscreen-inverted-x : See [2]. Inversion is applied relative to that + which may already be specified by the device's + FLIP_X and FLIP_Y register fields. + +- touchscreen-inverted-y : See [2]. Inversion is applied relative to that + which may already be specified by the device's + FLIP_X and FLIP_Y register fields. + +- touchscreen-swapped-x-y : See [2]. Swapping is applied relative to that + which may already be specified by the device's + SWITCH_XY_AXIS register field. + +[0]: Documentation/devicetree/bindings/interrupt-controller/interrupts.txt +[1]: Documentation/devicetree/bindings/gpio/gpio.txt +[2]: Documentation/devicetree/bindings/input/touchscreen/touchscreen.txt + +Example: + + &i2c1 { + /* ... */ + + touchscreen@74 { + compatible = "azoteq,iqs550"; + reg = <0x74>; + interrupt-parent = <&gpio>; + interrupts = <17 4>; + reset-gpios = <&gpio 27 1>; + + touchscreen-size-x = <640>; + touchscreen-size-y = <480>; + + touchscreen-max-pressure = <16000>; + }; + + /* ... */ + }; diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt b/Documentation/devicetree/bindings/net/keystone-netcp.txt index 3a65aabc76a2..6262c2f293b0 100644 --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt @@ -139,9 +139,9 @@ Optional properties: sub-module attached to this interface. The MAC address will be determined using the optional properties defined in -ethernet.txt, as provided by the of_get_mac_address API and only if efuse-mac -is set to 0. If any of the optional MAC address properties are not present, -then the driver will use random MAC address. +ethernet.txt and only if efuse-mac is set to 0. If all of the optional MAC +address properties are not present, then the driver will use a random MAC +address. Example binding: diff --git a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt index 74665502f4cf..7e675dafc256 100644 --- a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt +++ b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt @@ -16,8 +16,8 @@ Optional properties: - ieee80211-freq-limit: See ieee80211.txt - mediatek,mtd-eeprom: Specify a MTD partition + offset containing EEPROM data -The driver is using of_get_mac_address API, so the MAC address can be as well -be set with corresponding optional properties defined in net/ethernet.txt. +The MAC address can as well be set with corresponding optional properties +defined in net/ethernet.txt. Optional nodes: - led: Properties for a connected LED diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index 9ed399977297..e0e216002efd 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -55,6 +55,7 @@ avic Shanghai AVIC Optoelectronics Co., Ltd. avnet Avnet, Inc. axentia Axentia Technologies AB axis Axis Communications AB +azoteq Azoteq (Pty) Ltd bananapi BIPAI KEJI LIMITED bhf Beckhoff Automation GmbH & Co. KG bitmain Bitmain Technologies diff --git a/Documentation/index.rst b/Documentation/index.rst index 9e01aace4f48..a7566ef62411 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -114,6 +114,7 @@ implementation. x86/index sh/index + x86/index Filesystem Documentation ------------------------ diff --git a/Documentation/x86/conf.py b/Documentation/x86/conf.py new file mode 100644 index 000000000000..33c5c3142e20 --- /dev/null +++ b/Documentation/x86/conf.py @@ -0,0 +1,10 @@ +# -*- coding: utf-8; mode: python -*- + +project = "X86 architecture specific documentation" + +tags.add("subproject") + +latex_documents = [ + ('index', 'x86.tex', project, + 'The kernel development community', 'manual'), +] diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index 73a487957fd4..ae36fc5fc649 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -23,6 +23,7 @@ x86-specific Documentation intel_mpx amd-memory-encryption pti + mds microcode resctrl_ui usb-legacy-support diff --git a/Documentation/x86/mds.rst b/Documentation/x86/mds.rst new file mode 100644 index 000000000000..534e9baa4e1d --- /dev/null +++ b/Documentation/x86/mds.rst @@ -0,0 +1,225 @@ +Microarchitectural Data Sampling (MDS) mitigation +================================================= + +.. _mds: + +Overview +-------- + +Microarchitectural Data Sampling (MDS) is a family of side channel attacks +on internal buffers in Intel CPUs. The variants are: + + - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126) + - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130) + - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127) + - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091) + +MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a +dependent load (store-to-load forwarding) as an optimization. The forward +can also happen to a faulting or assisting load operation for a different +memory address, which can be exploited under certain conditions. Store +buffers are partitioned between Hyper-Threads so cross thread forwarding is +not possible. But if a thread enters or exits a sleep state the store +buffer is repartitioned which can expose data from one thread to the other. + +MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage +L1 miss situations and to hold data which is returned or sent in response +to a memory or I/O operation. Fill buffers can forward data to a load +operation and also write data to the cache. When the fill buffer is +deallocated it can retain the stale data of the preceding operations which +can then be forwarded to a faulting or assisting load operation, which can +be exploited under certain conditions. Fill buffers are shared between +Hyper-Threads so cross thread leakage is possible. + +MLPDS leaks Load Port Data. Load ports are used to perform load operations +from memory or I/O. The received data is then forwarded to the register +file or a subsequent operation. In some implementations the Load Port can +contain stale data from a previous operation which can be forwarded to +faulting or assisting loads under certain conditions, which again can be +exploited eventually. Load ports are shared between Hyper-Threads so cross +thread leakage is possible. + +MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from +memory that takes a fault or assist can leave data in a microarchitectural +structure that may later be observed using one of the same methods used by +MSBDS, MFBDS or MLPDS. + +Exposure assumptions +-------------------- + +It is assumed that attack code resides in user space or in a guest with one +exception. The rationale behind this assumption is that the code construct +needed for exploiting MDS requires: + + - to control the load to trigger a fault or assist + + - to have a disclosure gadget which exposes the speculatively accessed + data for consumption through a side channel. + + - to control the pointer through which the disclosure gadget exposes the + data + +The existence of such a construct in the kernel cannot be excluded with +100% certainty, but the complexity involved makes it extremly unlikely. + +There is one exception, which is untrusted BPF. The functionality of +untrusted BPF is limited, but it needs to be thoroughly investigated +whether it can be used to create such a construct. + + +Mitigation strategy +------------------- + +All variants have the same mitigation strategy at least for the single CPU +thread case (SMT off): Force the CPU to clear the affected buffers. + +This is achieved by using the otherwise unused and obsolete VERW +instruction in combination with a microcode update. The microcode clears +the affected CPU buffers when the VERW instruction is executed. + +For virtualization there are two ways to achieve CPU buffer +clearing. Either the modified VERW instruction or via the L1D Flush +command. The latter is issued when L1TF mitigation is enabled so the extra +VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to +be issued. + +If the VERW instruction with the supplied segment selector argument is +executed on a CPU without the microcode update there is no side effect +other than a small number of pointlessly wasted CPU cycles. + +This does not protect against cross Hyper-Thread attacks except for MSBDS +which is only exploitable cross Hyper-thread when one of the Hyper-Threads +enters a C-state. + +The kernel provides a function to invoke the buffer clearing: + + mds_clear_cpu_buffers() + +The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state +(idle) transitions. + +As a special quirk to address virtualization scenarios where the host has +the microcode updated, but the hypervisor does not (yet) expose the +MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the +hope that it might actually clear the buffers. The state is reflected +accordingly. + +According to current knowledge additional mitigations inside the kernel +itself are not required because the necessary gadgets to expose the leaked +data cannot be controlled in a way which allows exploitation from malicious +user space or VM guests. + +Kernel internal mitigation modes +-------------------------------- + + ======= ============================================================ + off Mitigation is disabled. Either the CPU is not affected or + mds=off is supplied on the kernel command line + + full Mitigation is enabled. CPU is affected and MD_CLEAR is + advertised in CPUID. + + vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not + advertised in CPUID. That is mainly for virtualization + scenarios where the host has the updated microcode but the + hypervisor does not expose MD_CLEAR in CPUID. It's a best + effort approach without guarantee. + ======= ============================================================ + +If the CPU is affected and mds=off is not supplied on the kernel command +line then the kernel selects the appropriate mitigation mode depending on +the availability of the MD_CLEAR CPUID bit. + +Mitigation points +----------------- + +1. Return to user space +^^^^^^^^^^^^^^^^^^^^^^^ + + When transitioning from kernel to user space the CPU buffers are flushed + on affected CPUs when the mitigation is not disabled on the kernel + command line. The migitation is enabled through the static key + mds_user_clear. + + The mitigation is invoked in prepare_exit_to_usermode() which covers + most of the kernel to user space transitions. There are a few exceptions + which are not invoking prepare_exit_to_usermode() on return to user + space. These exceptions use the paranoid exit code. + + - Non Maskable Interrupt (NMI): + + Access to sensible data like keys, credentials in the NMI context is + mostly theoretical: The CPU can do prefetching or execute a + misspeculated code path and thereby fetching data which might end up + leaking through a buffer. + + But for mounting other attacks the kernel stack address of the task is + already valuable information. So in full mitigation mode, the NMI is + mitigated on the return from do_nmi() to provide almost complete + coverage. + + - Double fault (#DF): + + A double fault is usually fatal, but the ESPFIX workaround, which can + be triggered from user space through modify_ldt(2) is a recoverable + double fault. #DF uses the paranoid exit path, so explicit mitigation + in the double fault handler is required. + + - Machine Check Exception (#MC): + + Another corner case is a #MC which hits between the CPU buffer clear + invocation and the actual return to user. As this still is in kernel + space it takes the paranoid exit path which does not clear the CPU + buffers. So the #MC handler repopulates the buffers to some + extent. Machine checks are not reliably controllable and the window is + extremly small so mitigation would just tick a checkbox that this + theoretical corner case is covered. To keep the amount of special + cases small, ignore #MC. + + - Debug Exception (#DB): + + This takes the paranoid exit path only when the INT1 breakpoint is in + kernel space. #DB on a user space address takes the regular exit path, + so no extra mitigation required. + + +2. C-State transition +^^^^^^^^^^^^^^^^^^^^^ + + When a CPU goes idle and enters a C-State the CPU buffers need to be + cleared on affected CPUs when SMT is active. This addresses the + repartitioning of the store buffer when one of the Hyper-Threads enters + a C-State. + + When SMT is inactive, i.e. either the CPU does not support it or all + sibling threads are offline CPU buffer clearing is not required. + + The idle clearing is enabled on CPUs which are only affected by MSBDS + and not by any other MDS variant. The other MDS variants cannot be + protected against cross Hyper-Thread attacks because the Fill Buffer and + the Load Ports are shared. So on CPUs affected by other variants, the + idle clearing would be a window dressing exercise and is therefore not + activated. + + The invocation is controlled by the static key mds_idle_clear which is + switched depending on the chosen mitigation mode and the SMT state of + the system. + + The buffer clear is only invoked before entering the C-State to prevent + that stale data from the idling CPU from spilling to the Hyper-Thread + sibling after the store buffer got repartitioned and all entries are + available to the non idle sibling. + + When coming out of idle the store buffer is partitioned again so each + sibling has half of it available. The back from idle CPU could be then + speculatively exposed to contents of the sibling. The buffers are + flushed either on exit to user space or on VMENTER so malicious code + in user space or the guest cannot speculatively access them. + + The mitigation is hooked into all variants of halt()/mwait(), but does + not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver + has been superseded by the intel_idle driver around 2010 and is + preferred on all affected CPUs which are expected to gain the MD_CLEAR + functionality in microcode. Aside of that the IO-Port mechanism is a + legacy interface which is only used on older systems which are either + not affected or do not receive microcode updates anymore. diff --git a/arch/powerpc/sysdev/tsi108_dev.c b/arch/powerpc/sysdev/tsi108_dev.c index c92dcac85231..026619c9a8cb 100644 --- a/arch/powerpc/sysdev/tsi108_dev.c +++ b/arch/powerpc/sysdev/tsi108_dev.c @@ -18,6 +18,7 @@ #include <linux/irq.h> #include <linux/export.h> #include <linux/device.h> +#include <linux/etherdevice.h> #include <linux/platform_device.h> #include <linux/of_net.h> #include <asm/tsi108.h> @@ -106,7 +107,7 @@ static int __init tsi108_eth_of_init(void) mac_addr = of_get_mac_address(np); if (!IS_ERR(mac_addr)) - memcpy(tsi_eth_data.mac_addr, mac_addr, 6); + ether_addr_copy(tsi_eth_data.mac_addr, mac_addr); ph = of_get_property(np, "mdio-handle", NULL); mdio = of_find_node_by_phandle(*ph); diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 51beb8d29123..a986b3c8294c 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -32,6 +32,7 @@ #include <asm/vdso.h> #include <asm/cpufeature.h> #include <asm/fpu/api.h> +#include <asm/nospec-branch.h> #define CREATE_TRACE_POINTS #include <trace/events/syscalls.h> @@ -220,6 +221,8 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs) #endif user_enter_irqoff(); + + mds_user_clear_cpu_buffers(); } #define SYSCALL_EXIT_WORK_FLAGS \ diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 981ff9479648..75f27ee2c263 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -344,6 +344,7 @@ /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */ #define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */ +#define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */ #define X86_FEATURE_TSX_FORCE_ABORT (18*32+13) /* "" TSX_FORCE_ABORT */ #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ @@ -382,5 +383,7 @@ #define X86_BUG_SPECTRE_V2 X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */ #define X86_BUG_SPEC_STORE_BYPASS X86_BUG(17) /* CPU is affected by speculative store bypass attack */ #define X86_BUG_L1TF X86_BUG(18) /* CPU is affected by L1 Terminal Fault */ +#define X86_BUG_MDS X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */ +#define X86_BUG_MSBDS_ONLY X86_BUG(20) /* CPU is only affected by the MSDBS variant of BUG_MDS */ #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index 058e40fed167..8a0e56e1dcc9 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -6,6 +6,8 @@ #ifndef __ASSEMBLY__ +#include <asm/nospec-branch.h> + /* Provide __cpuidle; we can't safely include <linux/cpu.h> */ #define __cpuidle __attribute__((__section__(".cpuidle.text"))) @@ -54,11 +56,13 @@ static inline void native_irq_enable(void) static inline __cpuidle void native_safe_halt(void) { + mds_idle_clear_cpu_buffers(); asm volatile("sti; hlt": : :"memory"); } static inline __cpuidle void native_halt(void) { + mds_idle_clear_cpu_buffers(); asm volatile("hlt": : :"memory"); } diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 1378518cf63f..88dd202c8b00 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_MSR_INDEX_H #define _ASM_X86_MSR_INDEX_H +#include <linux/bits.h> + /* * CPU model specific register (MSR) numbers. * @@ -40,14 +42,14 @@ /* Intel MSRs. Some also available on other CPUs */ #define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */ -#define SPEC_CTRL_IBRS (1 << 0) /* Indirect Branch Restricted Speculation */ +#define SPEC_CTRL_IBRS BIT(0) /* Indirect Branch Restricted Speculation */ #define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */ -#define SPEC_CTRL_STIBP (1 << SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */ +#define SPEC_CTRL_STIBP BIT(SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */ #define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */ -#define SPEC_CTRL_SSBD (1 << SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */ +#define SPEC_CTRL_SSBD BIT(SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */ #define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */ -#define PRED_CMD_IBPB (1 << 0) /* Indirect Branch Prediction Barrier */ +#define PRED_CMD_IBPB BIT(0) /* Indirect Branch Prediction Barrier */ #define MSR_PPIN_CTL 0x0000004e #define MSR_PPIN 0x0000004f @@ -69,20 +71,25 @@ #define MSR_MTRRcap 0x000000fe #define MSR_IA32_ARCH_CAPABILITIES 0x0000010a -#define ARCH_CAP_RDCL_NO (1 << 0) /* Not susceptible to Meltdown */ -#define ARCH_CAP_IBRS_ALL (1 << 1) /* Enhanced IBRS support */ -#define ARCH_CAP_SKIP_VMENTRY_L1DFLUSH (1 << 3) /* Skip L1D flush on vmentry */ -#define ARCH_CAP_SSB_NO (1 << 4) /* - * Not susceptible to Speculative Store Bypass - * attack, so no Speculative Store Bypass - * control required. - */ +#define ARCH_CAP_RDCL_NO BIT(0) /* Not susceptible to Meltdown */ +#define ARCH_CAP_IBRS_ALL BIT(1) /* Enhanced IBRS support */ +#define ARCH_CAP_SKIP_VMENTRY_L1DFLUSH BIT(3) /* Skip L1D flush on vmentry */ +#define ARCH_CAP_SSB_NO BIT(4) /* + * Not susceptible to Speculative Store Bypass + * attack, so no Speculative Store Bypass + * control required. + */ +#define ARCH_CAP_MDS_NO BIT(5) /* + * Not susceptible to + * Microarchitectural Data + * Sampling (MDS) vulnerabilities. + */ #define MSR_IA32_FLUSH_CMD 0x0000010b -#define L1D_FLUSH (1 << 0) /* - * Writeback and invalidate the - * L1 data cache. - */ +#define L1D_FLUSH BIT(0) /* + * Writeback and invalidate the + * L1 data cache. + */ #define MSR_IA32_BBL_CR_CTL 0x00000119 #define MSR_IA32_BBL_CR_CTL3 0x0000011e diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 39a2fb29378a..eb0f80ce8524 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -6,6 +6,7 @@ #include <linux/sched/idle.h> #include <asm/cpufeature.h> +#include <asm/nospec-branch.h> #define MWAIT_SUBSTATE_MASK 0xf #define MWAIT_CSTATE_MASK 0xf @@ -40,6 +41,8 @@ static inline void __monitorx(const void *eax, unsigned long ecx, static inline void __mwait(unsigned long eax, unsigned long ecx) { + mds_idle_clear_cpu_buffers(); + /* "mwait %eax, %ecx;" */ asm volatile(".byte 0x0f, 0x01, 0xc9;" :: "a" (eax), "c" (ecx)); @@ -74,6 +77,8 @@ static inline void __mwait(unsigned long eax, unsigned long ecx) static inline void __mwaitx(unsigned long eax, unsigned long ebx, unsigned long ecx) { + /* No MDS buffer clear as this is AMD/HYGON only */ + /* "mwaitx %eax, %ebx, %ecx;" */ asm volatile(".byte 0x0f, 0x01, 0xfb;" :: "a" (eax), "b" (ebx), "c" (ecx)); @@ -81,6 +86,8 @@ static inline void __mwaitx(unsigned long eax, unsigned long ebx, static inline void __sti_mwait(unsigned long eax, unsigned long ecx) { + mds_idle_clear_cpu_buffers(); + trace_hardirqs_on(); /* "mwait %eax, %ecx;" */ asm volatile("sti; .byte 0x0f, 0x01, 0xc9;" diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index daf25b60c9e3..109f974f9835 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -308,6 +308,56 @@ DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp); DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb); +DECLARE_STATIC_KEY_FALSE(mds_user_clear); +DECLARE_STATIC_KEY_FALSE(mds_idle_clear); + +#include <asm/segment.h> + +/** + * mds_clear_cpu_buffers - Mitigation for MDS vulnerability + * + * This uses the otherwise unused and obsolete VERW instruction in + * combination with microcode which triggers a CPU buffer flush when the + * instruction is executed. + */ +static inline void mds_clear_cpu_buffers(void) +{ + static const u16 ds = __KERNEL_DS; + + /* + * Has to be the memory-operand variant because only that + * guarantees the CPU buffer flush functionality according to + * documentation. The register-operand variant does not. + * Works with any segment selector, but a valid writable + * data segment is the fastest variant. + * + * "cc" clobber is required because VERW modifies ZF. + */ + asm volatile("verw %[ds]" : : [ds] "m" (ds) : "cc"); +} + +/** + * mds_user_clear_cpu_buffers - Mitigation for MDS vulnerability + * + * Clear CPU buffers if the corresponding static key is enabled + */ +static inline void mds_user_clear_cpu_buffers(void) +{ + if (static_branch_likely(&mds_user_clear)) + mds_clear_cpu_buffers(); +} + +/** + * mds_idle_clear_cpu_buffers - Mitigation for MDS vulnerability + * + * Clear CPU buffers if the corresponding static key is enabled + */ +static inline void mds_idle_clear_cpu_buffers(void) +{ + if (static_branch_likely(&mds_idle_clear)) + mds_clear_cpu_buffers(); +} + #endif /* __ASSEMBLY__ */ /* diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7e99ef67bff0..c34a35c78618 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -978,4 +978,10 @@ enum l1tf_mitigations { extern enum l1tf_mitigations l1tf_mitigation; +enum mds_mitigations { + MDS_MITIGATION_OFF, + MDS_MITIGATION_FULL, + MDS_MITIGATION_VMWERV, +}; + #endif /* _ASM_X86_PROCESSOR_H */ diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 29630393f300..03b4cc0ec3a7 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -37,6 +37,7 @@ static void __init spectre_v2_select_mitigation(void); static void __init ssb_select_mitigation(void); static void __init l1tf_select_mitigation(void); +static void __init mds_select_mitigation(void); /* The base value of the SPEC_CTRL MSR that always has to be preserved. */ u64 x86_spec_ctrl_base; @@ -63,6 +64,13 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); /* Control unconditional IBPB in switch_mm() */ DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb); +/* Control MDS CPU buffer clear before returning to user space */ +DEFINE_STATIC_KEY_FALSE(mds_user_clear); +EXPORT_SYMBOL_GPL(mds_user_clear); +/* Control MDS CPU buffer clear before idling (halt, mwait) */ +DEFINE_STATIC_KEY_FALSE(mds_idle_clear); +EXPORT_SYMBOL_GPL(mds_idle_clear); + void __init check_bugs(void) { identify_boot_cpu(); @@ -101,6 +109,10 @@ void __init check_bugs(void) l1tf_select_mitigation(); + mds_select_mitigation(); + + arch_smt_update(); + #ifdef CONFIG_X86_32 /* * Check whether we are able to run this kernel safely on SMP. @@ -207,6 +219,61 @@ static void x86_amd_ssb_disable(void) } #undef pr_fmt +#define pr_fmt(fmt) "MDS: " fmt + +/* Default mitigation for MDS-affected CPUs */ +static enum mds_mitigations mds_mitigation __ro_after_init = MDS_MITIGATION_FULL; +static bool mds_nosmt __ro_after_init = false; + +static const char * const mds_strings[] = { + [MDS_MITIGATION_OFF] = "Vulnerable", + [MDS_MITIGATION_FULL] = "Mitigation: Clear CPU buffers", + [MDS_MITIGATION_VMWERV] = "Vulnerable: Clear CPU buffers attempted, no microcode", +}; + +static void __init mds_select_mitigation(void) +{ + if (!boot_cpu_has_bug(X86_BUG_MDS) || cpu_mitigations_off()) { + mds_mitigation = MDS_MITIGATION_OFF; + return; + } + + if (mds_mitigation == MDS_MITIGATION_FULL) { + if (!boot_cpu_has(X86_FEATURE_MD_CLEAR)) + mds_mitigation = MDS_MITIGATION_VMWERV; + + static_branch_enable(&mds_user_clear); + + if (!boot_cpu_has(X86_BUG_MSBDS_ONLY) && + (mds_nosmt || cpu_mitigations_auto_nosmt())) + cpu_smt_disable(false); + } + + pr_info("%s\n", mds_strings[mds_mitigation]); +} + +static int __init mds_cmdline(char *str) +{ + if (!boot_cpu_has_bug(X86_BUG_MDS)) + return 0; + + if (!str) + return -EINVAL; + + if (!strcmp(str, "off")) + mds_mitigation = MDS_MITIGATION_OFF; + else if (!strcmp(str, "full")) + mds_mitigation = MDS_MITIGATION_FULL; + else if (!strcmp(str, "full,nosmt")) { + mds_mitigation = MDS_MITIGATION_FULL; + mds_nosmt = true; + } + + return 0; +} +early_param("mds", mds_cmdline); + +#undef pr_fmt #define pr_fmt(fmt) "Spectre V2 : " fmt static enum spectre_v2_mitigation spectre_v2_enabled __ro_after_init = @@ -575,9 +642,6 @@ specv2_set_mode: /* Set up IBPB and STIBP depending on the general spectre V2 command */ spectre_v2_user_select_mitigation(cmd); - - /* Enable STIBP if appropriate */ - arch_smt_update(); } static void update_stibp_msr(void * __unused) @@ -611,6 +675,31 @@ static void update_indir_branch_cond(void) static_branch_disable(&switch_to_cond_stibp); } +#undef pr_fmt +#define pr_fmt(fmt) fmt + +/* Update the static key controlling the MDS CPU buffer clear in idle */ +static void update_mds_branch_idle(void) +{ + /* + * Enable the idle clearing if SMT is active on CPUs which are + * affected only by MSBDS and not any other MDS variant. + * + * The other variants cannot be mitigated when SMT is enabled, so + * clearing the buffers on idle just to prevent the Store Buffer + * repartitioning leak would be a window dressing exercise. + */ + if (!boot_cpu_has_bug(X86_BUG_MSBDS_ONLY)) + return; + + if (sched_smt_active()) + static_branch_enable(&mds_idle_clear); + else + static_branch_disable(&mds_idle_clear); +} + +#define MDS_MSG_SMT "MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.\n" + void arch_smt_update(void) { /* Enhanced IBRS implies STIBP. No update required. */ @@ -632,6 +721,17 @@ void arch_smt_update(void) break; } + switch (mds_mitigation) { + case MDS_MITIGATION_FULL: + case MDS_MITIGATION_VMWERV: + if (sched_smt_active() && !boot_cpu_has(X86_BUG_MSBDS_ONLY)) + pr_warn_once(MDS_MSG_SMT); + update_mds_branch_idle(); + break; + case MDS_MITIGATION_OFF: + break; + } + mutex_unlock(&spec_ctrl_mutex); } @@ -1043,7 +1143,7 @@ static void __init l1tf_select_mitigation(void) pr_info("You may make it effective by booting the kernel with mem=%llu parameter.\n", half_pa); pr_info("However, doing so will make a part of your RAM unusable.\n"); - pr_info("Reading https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html might help you decide.\n"); + pr_info("Reading https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html might help you decide.\n"); return; } @@ -1076,6 +1176,7 @@ static int __init l1tf_cmdline(char *str) early_param("l1tf", l1tf_cmdline); #undef pr_fmt +#define pr_fmt(fmt) fmt #ifdef CONFIG_SYSFS @@ -1114,6 +1215,23 @@ static ssize_t l1tf_show_state(char *buf) } #endif +static ssize_t mds_show_state(char *buf) +{ + if (!hypervisor_is_type(X86_HYPER_NATIVE)) { + return sprintf(buf, "%s; SMT Host state unknown\n", + mds_strings[mds_mitigation]); + } + + if (boot_cpu_has(X86_BUG_MSBDS_ONLY)) { + return sprintf(buf, "%s; SMT %s\n", mds_strings[mds_mitigation], + (mds_mitigation == MDS_MITIGATION_OFF ? "vulnerable" : + sched_smt_active() ? "mitigated" : "disabled")); + } + + return sprintf(buf, "%s; SMT %s\n", mds_strings[mds_mitigation], + sched_smt_active() ? "vulnerable" : "disabled"); +} + static char *stibp_state(void) { if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) @@ -1180,6 +1298,10 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV)) return l1tf_show_state(buf); break; + + case X86_BUG_MDS: + return mds_show_state(buf); + default: break; } @@ -1211,4 +1333,9 @@ ssize_t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *b { return cpu_show_common(dev, attr, buf, X86_BUG_L1TF); } + +ssize_t cpu_show_mds(struct device *dev, struct device_attribute *attr, char *buf) +{ + return cpu_show_common(dev, attr, buf, X86_BUG_MDS); +} #endif diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 8739bdfe9bdf..d7f55ad2dfb1 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -940,61 +940,77 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c) #endif } -static const __initconst struct x86_cpu_id cpu_no_speculation[] = { - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL_TABLET, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_BONNELL_MID, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SALTWELL_MID, X86_FEATURE_ANY }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_BONNELL, X86_FEATURE_ANY }, - { X86_VENDOR_CENTAUR, 5 }, - { X86_VENDOR_INTEL, 5 }, - { X86_VENDOR_NSC, 5 }, - { X86_VENDOR_ANY, 4 }, +#define NO_SPECULATION BIT(0) +#define NO_MELTDOWN BIT(1) +#define NO_SSB BIT(2) +#define NO_L1TF BIT(3) +#define NO_MDS BIT(4) +#define MSBDS_ONLY BIT(5) + +#define VULNWL(_vendor, _family, _model, _whitelist) \ + { X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist } + +#define VULNWL_INTEL(model, whitelist) \ + VULNWL(INTEL, 6, INTEL_FAM6_##model, whitelist) + +#define VULNWL_AMD(family, whitelist) \ + VULNWL(AMD, family, X86_MODEL_ANY, whitelist) + +#define VULNWL_HYGON(family, whitelist) \ + VULNWL(HYGON, family, X86_MODEL_ANY, whitelist) + +static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = { + VULNWL(ANY, 4, X86_MODEL_ANY, NO_SPECULATION), + VULNWL(CENTAUR, 5, X86_MODEL_ANY, NO_SPECULATION), + VULNWL(INTEL, 5, X86_MODEL_ANY, NO_SPECULATION), + VULNWL(NSC, 5, X86_MODEL_ANY, NO_SPECULATION), + + /* Intel Family 6 */ + VULNWL_INTEL(ATOM_SALTWELL, NO_SPECULATION), + VULNWL_INTEL(ATOM_SALTWELL_TABLET, NO_SPECULATION), + VULNWL_INTEL(ATOM_SALTWELL_MID, NO_SPECULATION), + VULNWL_INTEL(ATOM_BONNELL, NO_SPECULATION), + VULNWL_INTEL(ATOM_BONNELL_MID, NO_SPECULATION), + + VULNWL_INTEL(ATOM_SILVERMONT, NO_SSB | NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(ATOM_SILVERMONT_X, NO_SSB | NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(ATOM_SILVERMONT_MID, NO_SSB | NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(ATOM_AIRMONT, NO_SSB | NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(XEON_PHI_KNL, NO_SSB | NO_L1TF | MSBDS_ONLY), + VULNWL_INTEL(XEON_PHI_KNM, NO_SSB | NO_L1TF | MSBDS_ONLY), + + VULNWL_INTEL(CORE_YONAH, NO_SSB), + + VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF | MSBDS_ONLY), + + VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF), + VULNWL_INTEL(ATOM_GOLDMONT_X, NO_MDS | NO_L1TF), + VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF), + + /* AMD Family 0xf - 0x12 */ + VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), + VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), + VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), + VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS), + + /* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */ + VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS), + VULNWL_HYGON(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS), {} }; -static const __initconst struct x86_cpu_id cpu_no_meltdown[] = { - { X86_VENDOR_AMD }, - { X86_VENDOR_HYGON }, - {} -}; - -/* Only list CPUs which speculate but are non susceptible to SSB */ -static const __initconst struct x86_cpu_id cpu_no_spec_store_bypass[] = { - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_X }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_MID }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_CORE_YONAH }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM }, - { X86_VENDOR_AMD, 0x12, }, - { X86_VENDOR_AMD, 0x11, }, - { X86_VENDOR_AMD, 0x10, }, - { X86_VENDOR_AMD, 0xf, }, - {} -}; +static bool __init cpu_matches(unsigned long which) +{ + const struct x86_cpu_id *m = x86_match_cpu(cpu_vuln_whitelist); -static const __initconst struct x86_cpu_id cpu_no_l1tf[] = { - /* in addition to cpu_no_speculation */ - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_X }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT_MID }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT_MID }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT_X }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT_PLUS }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL }, - { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM }, - {} -}; + return m && !!(m->driver_data & which); +} static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) { u64 ia32_cap = 0; - if (x86_match_cpu(cpu_no_speculation)) + if (cpu_matches(NO_SPECULATION)) return; setup_force_cpu_bug(X86_BUG_SPECTRE_V1); @@ -1003,15 +1019,20 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES)) rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap); - if (!x86_match_cpu(cpu_no_spec_store_bypass) && - !(ia32_cap & ARCH_CAP_SSB_NO) && + if (!cpu_matches(NO_SSB) && !(ia32_cap & ARCH_CAP_SSB_NO) && !cpu_has(c, X86_FEATURE_AMD_SSB_NO)) setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS); if (ia32_cap & ARCH_CAP_IBRS_ALL) setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED); - if (x86_match_cpu(cpu_no_meltdown)) + if (!cpu_matches(NO_MDS) && !(ia32_cap & ARCH_CAP_MDS_NO)) { + setup_force_cpu_bug(X86_BUG_MDS); + if (cpu_matches(MSBDS_ONLY)) + setup_force_cpu_bug(X86_BUG_MSBDS_ONLY); + } + + if (cpu_matches(NO_MELTDOWN)) return; /* Rogue Data Cache Load? No! */ @@ -1020,7 +1041,7 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN); - if (x86_match_cpu(cpu_no_l1tf)) + if (cpu_matches(NO_L1TF)) return; setup_force_cpu_bug(X86_BUG_L1TF); diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c index 3755d0310026..05b09896cfaf 100644 --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -35,6 +35,7 @@ #include <asm/x86_init.h> #include <asm/reboot.h> #include <asm/cache.h> +#include <asm/nospec-branch.h> #define CREATE_TRACE_POINTS #include <trace/events/nmi.h> @@ -551,6 +552,9 @@ nmi_restart: write_cr2(this_cpu_read(nmi_cr2)); if (this_cpu_dec_return(nmi_state)) goto nmi_restart; + + if (user_mode(regs)) + mds_user_clear_cpu_buffers(); } NOKPROBE_SYMBOL(do_nmi); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 8b6d03e55d2f..7de466eb960b 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -58,6 +58,7 @@ #include <asm/alternative.h> #include <asm/fpu/xstate.h> #include <asm/trace/mpx.h> +#include <asm/nospec-branch.h> #include <asm/mpx.h> #include <asm/vm86.h> #include <asm/umip.h> @@ -367,6 +368,13 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) regs->ip = (unsigned long)general_protection; regs->sp = (unsigned long)&gpregs->orig_ax; + /* + * This situation can be triggered by userspace via + * modify_ldt(2) and the return does not take the regular + * user space exit, so a CPU buffer clear is required when + * MDS mitigation is enabled. + */ + mds_user_clear_cpu_buffers(); return; } #endif diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index fd3951638ae4..bbbe611f0c49 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -410,7 +410,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | - F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP); + F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) | + F(MD_CLEAR); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 9663d41cc2bc..e1fa935a545f 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6431,8 +6431,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) */ x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0); + /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); + else if (static_branch_unlikely(&mds_user_clear)) + mds_clear_cpu_buffers(); if (vcpu->arch.cr2 != read_cr2()) write_cr2(vcpu->arch.cr2); @@ -6668,8 +6671,8 @@ free_partial_vcpu: return ERR_PTR(err); } -#define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html for details.\n" -#define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html for details.\n" +#define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n" +#define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n" static int vmx_vm_init(struct kvm *kvm) { diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 668139cfa664..cc37511de866 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -548,11 +548,18 @@ ssize_t __weak cpu_show_l1tf(struct device *dev, return sprintf(buf, "Not affected\n"); } +ssize_t __weak cpu_show_mds(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "Not affected\n"); +} + static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL); static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL); +static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL); static struct attribute *cpu_root_vulnerabilities_attrs[] = { &dev_attr_meltdown.attr, @@ -560,6 +567,7 @@ static struct attribute *cpu_root_vulnerabilities_attrs[] = { &dev_attr_spectre_v2.attr, &dev_attr_spec_store_bypass.attr, &dev_attr_l1tf.attr, + &dev_attr_mds.attr, NULL }; diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c index 46c6efea1404..abdb01879caa 100644 --- a/drivers/hid/hid-input.c +++ b/drivers/hid/hid-input.c @@ -1051,6 +1051,8 @@ static void hidinput_configure_usage(struct hid_input *hidinput, struct hid_fiel case 0x28b: map_key_clear(KEY_FORWARDMAIL); break; case 0x28c: map_key_clear(KEY_SEND); break; + case 0x29d: map_key_clear(KEY_KBD_LAYOUT_NEXT); break; + case 0x2c7: map_key_clear(KEY_KBDINPUTASSIST_PREV); break; case 0x2c8: map_key_clear(KEY_KBDINPUTASSIST_NEXT); break; case 0x2c9: map_key_clear(KEY_KBDINPUTASSIST_PREVGROUP); break; diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c index f040d8881ff2..d1e25aba8212 100644 --- a/drivers/input/evdev.c +++ b/drivers/input/evdev.c @@ -503,14 +503,13 @@ static int evdev_open(struct inode *inode, struct file *file) { struct evdev *evdev = container_of(inode->i_cdev, struct evdev, cdev); unsigned int bufsize = evdev_compute_buffer_size(evdev->handle.dev); - unsigned int size = sizeof(struct evdev_client) + - bufsize * sizeof(struct input_event); struct evdev_client *client; int error; - client = kzalloc(size, GFP_KERNEL | __GFP_NOWARN); + client = kzalloc(struct_size(client, buffer, bufsize), + GFP_KERNEL | __GFP_NOWARN); if (!client) - client = vzalloc(size); + client = vzalloc(struct_size(client, buffer, bufsize)); if (!client) return -ENOMEM; diff --git a/drivers/input/keyboard/Kconfig b/drivers/input/keyboard/Kconfig index 52d7f55fca32..1fe039d7326b 100644 --- a/drivers/input/keyboard/Kconfig +++ b/drivers/input/keyboard/Kconfig @@ -137,6 +137,17 @@ config KEYBOARD_ATKBD_RDI_KEYCODES right-hand column will be interpreted as the key shown in the left-hand column. +config KEYBOARD_QT1050 + tristate "Microchip AT42QT1050 Touch Sensor Chip" + depends on I2C + select REGMAP_I2C + help + Say Y here if you want to use Microchip AT42QT1050 QTouch + Sensor chip as input device. + + To compile this driver as a module, choose M here: + the module will be called qt1050 + config KEYBOARD_QT1070 tristate "Atmel AT42QT1070 Touch Sensor Chip" depends on I2C diff --git a/drivers/input/keyboard/Makefile b/drivers/input/keyboard/Makefile index 182e92985dbf..f0291ca39f62 100644 --- a/drivers/input/keyboard/Makefile +++ b/drivers/input/keyboard/Makefile @@ -50,6 +50,7 @@ obj-$(CONFIG_KEYBOARD_OPENCORES) += opencores-kbd.o obj-$(CONFIG_KEYBOARD_PMIC8XXX) += pmic8xxx-keypad.o obj-$(CONFIG_KEYBOARD_PXA27x) += pxa27x_keypad.o obj-$(CONFIG_KEYBOARD_PXA930_ROTARY) += pxa930_rotary.o +obj-$(CONFIG_KEYBOARD_QT1050) += qt1050.o obj-$(CONFIG_KEYBOARD_QT1070) += qt1070.o obj-$(CONFIG_KEYBOARD_QT2160) += qt2160.o obj-$(CONFIG_KEYBOARD_SAMSUNG) += samsung-keypad.o diff --git a/drivers/input/keyboard/atkbd.c b/drivers/input/keyboard/atkbd.c index 850bb259c20e..3ad93e3e2f4c 100644 --- a/drivers/input/keyboard/atkbd.c +++ b/drivers/input/keyboard/atkbd.c @@ -401,6 +401,8 @@ static irqreturn_t atkbd_interrupt(struct serio *serio, unsigned char data, if (ps2_handle_response(&atkbd->ps2dev, data)) goto out; + pm_wakeup_event(&serio->dev, 0); + if (!atkbd->enabled) goto out; diff --git a/drivers/input/keyboard/qt1050.c b/drivers/input/keyboard/qt1050.c new file mode 100644 index 000000000000..403060d05c3b --- /dev/null +++ b/drivers/input/keyboard/qt1050.c @@ -0,0 +1,598 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Microchip AT42QT1050 QTouch Sensor Controller + * + * Copyright (C) 2019 Pengutronix, Marco Felsch <kernel@pengutronix.de> + * + * Base on AT42QT1070 driver by: + * Bo Shen <voice.shen@atmel.com> + * Copyright (C) 2011 Atmel + */ + +#include <linux/delay.h> +#include <linux/i2c.h> +#include <linux/input.h> +#include <linux/interrupt.h> +#include <linux/kernel.h> +#include <linux/log2.h> +#include <linux/module.h> +#include <linux/of.h> +#include <linux/regmap.h> + +/* Chip ID */ +#define QT1050_CHIP_ID 0x00 +#define QT1050_CHIP_ID_VER 0x46 + +/* Firmware version */ +#define QT1050_FW_VERSION 0x01 + +/* Detection status */ +#define QT1050_DET_STATUS 0x02 + +/* Key status */ +#define QT1050_KEY_STATUS 0x03 + +/* Key Signals */ +#define QT1050_KEY_SIGNAL_0_MSB 0x06 +#define QT1050_KEY_SIGNAL_0_LSB 0x07 +#define QT1050_KEY_SIGNAL_1_MSB 0x08 +#define QT1050_KEY_SIGNAL_1_LSB 0x09 +#define QT1050_KEY_SIGNAL_2_MSB 0x0c +#define QT1050_KEY_SIGNAL_2_LSB 0x0d +#define QT1050_KEY_SIGNAL_3_MSB 0x0e +#define QT1050_KEY_SIGNAL_3_LSB 0x0f +#define QT1050_KEY_SIGNAL_4_MSB 0x10 +#define QT1050_KEY_SIGNAL_4_LSB 0x11 + +/* Reference data */ +#define QT1050_REF_DATA_0_MSB 0x14 +#define QT1050_REF_DATA_0_LSB 0x15 +#define QT1050_REF_DATA_1_MSB 0x16 +#define QT1050_REF_DATA_1_LSB 0x17 +#define QT1050_REF_DATA_2_MSB 0x1a +#define QT1050_REF_DATA_2_LSB 0x1b +#define QT1050_REF_DATA_3_MSB 0x1c +#define QT1050_REF_DATA_3_LSB 0x1d +#define QT1050_REF_DATA_4_MSB 0x1e +#define QT1050_REF_DATA_4_LSB 0x1f + +/* Negative threshold level */ +#define QT1050_NTHR_0 0x21 +#define QT1050_NTHR_1 0x22 +#define QT1050_NTHR_2 0x24 +#define QT1050_NTHR_3 0x25 +#define QT1050_NTHR_4 0x26 + +/* Pulse / Scale */ +#define QT1050_PULSE_SCALE_0 0x28 +#define QT1050_PULSE_SCALE_1 0x29 +#define QT1050_PULSE_SCALE_2 0x2b +#define QT1050_PULSE_SCALE_3 0x2c +#define QT1050_PULSE_SCALE_4 0x2d + +/* Detection integrator counter / AKS */ +#define QT1050_DI_AKS_0 0x2f +#define QT1050_DI_AKS_1 0x30 +#define QT1050_DI_AKS_2 0x32 +#define QT1050_DI_AKS_3 0x33 +#define QT1050_DI_AKS_4 0x34 + +/* Charge Share Delay */ +#define QT1050_CSD_0 0x36 +#define QT1050_CSD_1 0x37 +#define QT1050_CSD_2 0x39 +#define QT1050_CSD_3 0x3a +#define QT1050_CSD_4 0x3b + +/* Low Power Mode */ +#define QT1050_LPMODE 0x3d + +/* Calibration and Reset */ +#define QT1050_RES_CAL 0x3f +#define QT1050_RES_CAL_RESET BIT(7) +#define QT1050_RES_CAL_CALIBRATE BIT(1) + +#define QT1050_MAX_KEYS 5 +#define QT1050_RESET_TIME 255 + +struct qt1050_key_regs { + unsigned int nthr; + unsigned int pulse_scale; + unsigned int di_aks; + unsigned int csd; +}; + +struct qt1050_key { + u32 num; + u32 charge_delay; + u32 thr_cnt; + u32 samples; + u32 scale; + u32 keycode; +}; + +struct qt1050_priv { + struct i2c_client *client; + struct input_dev *input; + struct regmap *regmap; + struct qt1050_key keys[QT1050_MAX_KEYS]; + unsigned short keycodes[QT1050_MAX_KEYS]; + u8 reg_keys; + u8 last_keys; +}; + +static const struct qt1050_key_regs qt1050_key_regs_data[] = { + { + .nthr = QT1050_NTHR_0, + .pulse_scale = QT1050_PULSE_SCALE_0, + .di_aks = QT1050_DI_AKS_0, + .csd = QT1050_CSD_0, + }, { + .nthr = QT1050_NTHR_1, + .pulse_scale = QT1050_PULSE_SCALE_1, + .di_aks = QT1050_DI_AKS_1, + .csd = QT1050_CSD_1, + }, { + .nthr = QT1050_NTHR_2, + .pulse_scale = QT1050_PULSE_SCALE_2, + .di_aks = QT1050_DI_AKS_2, + .csd = QT1050_CSD_2, + }, { + .nthr = QT1050_NTHR_3, + .pulse_scale = QT1050_PULSE_SCALE_3, + .di_aks = QT1050_DI_AKS_3, + .csd = QT1050_CSD_3, + }, { + .nthr = QT1050_NTHR_4, + .pulse_scale = QT1050_PULSE_SCALE_4, + .di_aks = QT1050_DI_AKS_4, + .csd = QT1050_CSD_4, + } +}; + +static bool qt1050_volatile_reg(struct device *dev, unsigned int reg) +{ + switch (reg) { + case QT1050_DET_STATUS: + case QT1050_KEY_STATUS: + case QT1050_KEY_SIGNAL_0_MSB: + case QT1050_KEY_SIGNAL_0_LSB: + case QT1050_KEY_SIGNAL_1_MSB: + case QT1050_KEY_SIGNAL_1_LSB: + case QT1050_KEY_SIGNAL_2_MSB: + case QT1050_KEY_SIGNAL_2_LSB: + case QT1050_KEY_SIGNAL_3_MSB: + case QT1050_KEY_SIGNAL_3_LSB: + case QT1050_KEY_SIGNAL_4_MSB: + case QT1050_KEY_SIGNAL_4_LSB: + return true; + default: + return false; + } +} + +static const struct regmap_range qt1050_readable_ranges[] = { + regmap_reg_range(QT1050_CHIP_ID, QT1050_KEY_STATUS), + regmap_reg_range(QT1050_KEY_SIGNAL_0_MSB, QT1050_KEY_SIGNAL_1_LSB), + regmap_reg_range(QT1050_KEY_SIGNAL_2_MSB, QT1050_KEY_SIGNAL_4_LSB), + regmap_reg_range(QT1050_REF_DATA_0_MSB, QT1050_REF_DATA_1_LSB), + regmap_reg_range(QT1050_REF_DATA_2_MSB, QT1050_REF_DATA_4_LSB), + regmap_reg_range(QT1050_NTHR_0, QT1050_NTHR_1), + regmap_reg_range(QT1050_NTHR_2, QT1050_NTHR_4), + regmap_reg_range(QT1050_PULSE_SCALE_0, QT1050_PULSE_SCALE_1), + regmap_reg_range(QT1050_PULSE_SCALE_2, QT1050_PULSE_SCALE_4), + regmap_reg_range(QT1050_DI_AKS_0, QT1050_DI_AKS_1), + regmap_reg_range(QT1050_DI_AKS_2, QT1050_DI_AKS_4), + regmap_reg_range(QT1050_CSD_0, QT1050_CSD_1), + regmap_reg_range(QT1050_CSD_2, QT1050_RES_CAL), +}; + +static const struct regmap_access_table qt1050_readable_table = { + .yes_ranges = qt1050_readable_ranges, + .n_yes_ranges = ARRAY_SIZE(qt1050_readable_ranges), +}; + +static const struct regmap_range qt1050_writeable_ranges[] = { + regmap_reg_range(QT1050_NTHR_0, QT1050_NTHR_1), + regmap_reg_range(QT1050_NTHR_2, QT1050_NTHR_4), + regmap_reg_range(QT1050_PULSE_SCALE_0, QT1050_PULSE_SCALE_1), + regmap_reg_range(QT1050_PULSE_SCALE_2, QT1050_PULSE_SCALE_4), + regmap_reg_range(QT1050_DI_AKS_0, QT1050_DI_AKS_1), + regmap_reg_range(QT1050_DI_AKS_2, QT1050_DI_AKS_4), + regmap_reg_range(QT1050_CSD_0, QT1050_CSD_1), + regmap_reg_range(QT1050_CSD_2, QT1050_RES_CAL), +}; + +static const struct regmap_access_table qt1050_writeable_table = { + .yes_ranges = qt1050_writeable_ranges, + .n_yes_ranges = ARRAY_SIZE(qt1050_writeable_ranges), +}; + +static struct regmap_config qt1050_regmap_config = { + .reg_bits = 8, + .val_bits = 8, + .max_register = QT1050_RES_CAL, + + .cache_type = REGCACHE_RBTREE, + + .wr_table = &qt1050_writeable_table, + .rd_table = &qt1050_readable_table, + .volatile_reg = qt1050_volatile_reg, +}; + +static bool qt1050_identify(struct qt1050_priv *ts) +{ + unsigned int val; + int err; + + /* Read Chip ID */ + regmap_read(ts->regmap, QT1050_CHIP_ID, &val); + if (val != QT1050_CHIP_ID_VER) { + dev_err(&ts->client->dev, "ID %d not supported\n", val); + return false; + } + + /* Read firmware version */ + err = regmap_read(ts->regmap, QT1050_FW_VERSION, &val); + if (err) { + dev_err(&ts->client->dev, "could not read the firmware version\n"); + return false; + } + + dev_info(&ts->client->dev, "AT42QT1050 firmware version %1d.%1d\n", + val >> 4, val & 0xf); + + return true; +} + +static irqreturn_t qt1050_irq_threaded(int irq, void *dev_id) +{ + struct qt1050_priv *ts = dev_id; + struct input_dev *input = ts->input; + unsigned long new_keys, changed; + unsigned int val; + int i, err; + + /* Read the detected status register, thus clearing interrupt */ + err = regmap_read(ts->regmap, QT1050_DET_STATUS, &val); + if (err) { + dev_err(&ts->client->dev, "Fail to read detection status: %d\n", + err); + return IRQ_NONE; + } + + /* Read which key changed, keys are not continuous */ + err = regmap_read(ts->regmap, QT1050_KEY_STATUS, &val); + if (err) { + dev_err(&ts->client->dev, + "Fail to determine the key status: %d\n", err); + return IRQ_NONE; + } + new_keys = (val & 0x70) >> 2 | (val & 0x6) >> 1; + changed = ts->last_keys ^ new_keys; + /* Report registered keys only */ + changed &= ts->reg_keys; + + for_each_set_bit(i, &changed, QT1050_MAX_KEYS) + input_report_key(input, ts->keys[i].keycode, + test_bit(i, &new_keys)); + + ts->last_keys = new_keys; + input_sync(input); + + return IRQ_HANDLED; +} + +static const struct qt1050_key_regs *qt1050_get_key_regs(int key_num) +{ + return &qt1050_key_regs_data[key_num]; +} + +static int qt1050_set_key(struct regmap *map, int number, int on) +{ + const struct qt1050_key_regs *key_regs; + + key_regs = qt1050_get_key_regs(number); + + return regmap_update_bits(map, key_regs->di_aks, 0xfc, + on ? BIT(4) : 0x00); +} + +static int qt1050_apply_fw_data(struct qt1050_priv *ts) +{ + struct regmap *map = ts->regmap; + struct qt1050_key *button = &ts->keys[0]; + const struct qt1050_key_regs *key_regs; + int i, err; + + /* Disable all keys and enable only the specified ones */ + for (i = 0; i < QT1050_MAX_KEYS; i++) { + err = qt1050_set_key(map, i, 0); + if (err) + return err; + } + + for (i = 0; i < QT1050_MAX_KEYS; i++, button++) { + /* Keep KEY_RESERVED keys off */ + if (button->keycode == KEY_RESERVED) + continue; + + err = qt1050_set_key(map, button->num, 1); + if (err) + return err; + + key_regs = qt1050_get_key_regs(button->num); + + err = regmap_write(map, key_regs->pulse_scale, + (button->samples << 4) | (button->scale)); + if (err) + return err; + err = regmap_write(map, key_regs->csd, button->charge_delay); + if (err) + return err; + err = regmap_write(map, key_regs->nthr, button->thr_cnt); + if (err) + return err; + } + + return 0; +} + +static int qt1050_parse_fw(struct qt1050_priv *ts) +{ + struct device *dev = &ts->client->dev; + struct fwnode_handle *child; + int nbuttons; + + nbuttons = device_get_child_node_count(dev); + if (nbuttons == 0 || nbuttons > QT1050_MAX_KEYS) + return -ENODEV; + + device_for_each_child_node(dev, child) { + struct qt1050_key button; + + /* Required properties */ + if (fwnode_property_read_u32(child, "linux,code", + &button.keycode)) { + dev_err(dev, "Button without keycode\n"); + goto err; + } + if (button.keycode >= KEY_MAX) { + dev_err(dev, "Invalid keycode 0x%x\n", + button.keycode); + goto err; + } + + if (fwnode_property_read_u32(child, "reg", + &button.num)) { + dev_err(dev, "Button without pad number\n"); + goto err; + } + if (button.num < 0 || button.num > QT1050_MAX_KEYS - 1) + goto err; + + ts->reg_keys |= BIT(button.num); + + /* Optional properties */ + if (fwnode_property_read_u32(child, + "microchip,pre-charge-time-ns", + &button.charge_delay)) { + button.charge_delay = 0; + } else { + if (button.charge_delay % 2500 == 0) + button.charge_delay = + button.charge_delay / 2500; + else + button.charge_delay = 0; + } + + if (fwnode_property_read_u32(child, "microchip,average-samples", + &button.samples)) { + button.samples = 0; + } else { + if (is_power_of_2(button.samples)) + button.samples = ilog2(button.samples); + else + button.samples = 0; + } + + if (fwnode_property_read_u32(child, "microchip,average-scaling", + &button.scale)) { + button.scale = 0; + } else { + if (is_power_of_2(button.scale)) + button.scale = ilog2(button.scale); + else + button.scale = 0; + + } + + if (fwnode_property_read_u32(child, "microchip,threshold", + &button.thr_cnt)) { + button.thr_cnt = 20; + } else { + if (button.thr_cnt > 255) + button.thr_cnt = 20; + } + + ts->keys[button.num] = button; + } + + return 0; + +err: + fwnode_handle_put(child); + return -EINVAL; +} + +static int qt1050_probe(struct i2c_client *client) +{ + struct qt1050_priv *ts; + struct input_dev *input; + struct device *dev = &client->dev; + struct regmap *map; + unsigned int status, i; + int err; + + /* Check basic functionality */ + err = i2c_check_functionality(client->adapter, I2C_FUNC_SMBUS_BYTE); + if (!err) { + dev_err(&client->dev, "%s adapter not supported\n", + dev_driver_string(&client->adapter->dev)); + return -ENODEV; + } + + if (!client->irq) { + dev_err(dev, "assign a irq line to this device\n"); + return -EINVAL; + } + + ts = devm_kzalloc(dev, sizeof(*ts), GFP_KERNEL); + if (!ts) + return -ENOMEM; + + input = devm_input_allocate_device(dev); + if (!input) + return -ENOMEM; + + map = devm_regmap_init_i2c(client, &qt1050_regmap_config); + if (IS_ERR(map)) + return PTR_ERR(map); + + ts->client = client; + ts->input = input; + ts->regmap = map; + + i2c_set_clientdata(client, ts); + + /* Identify the qt1050 chip */ + if (!qt1050_identify(ts)) + return -ENODEV; + + /* Get pdata */ + err = qt1050_parse_fw(ts); + if (err) { + dev_err(dev, "Failed to parse firmware: %d\n", err); + return err; + } + + input->name = "AT42QT1050 QTouch Sensor"; + input->dev.parent = &client->dev; + input->id.bustype = BUS_I2C; + + /* Add the keycode */ + input->keycode = ts->keycodes; + input->keycodesize = sizeof(ts->keycodes[0]); + input->keycodemax = QT1050_MAX_KEYS; + + __set_bit(EV_KEY, input->evbit); + for (i = 0; i < QT1050_MAX_KEYS; i++) { + ts->keycodes[i] = ts->keys[i].keycode; + __set_bit(ts->keycodes[i], input->keybit); + } + + /* Trigger re-calibration */ + err = regmap_update_bits(ts->regmap, QT1050_RES_CAL, 0x7f, + QT1050_RES_CAL_CALIBRATE); + if (err) { + dev_err(dev, "Trigger calibration failed: %d\n", err); + return err; + } + err = regmap_read_poll_timeout(ts->regmap, QT1050_DET_STATUS, status, + status >> 7 == 1, 10000, 200000); + if (err) { + dev_err(dev, "Calibration failed: %d\n", err); + return err; + } + + /* Soft reset to set defaults */ + err = regmap_update_bits(ts->regmap, QT1050_RES_CAL, + QT1050_RES_CAL_RESET, QT1050_RES_CAL_RESET); + if (err) { + dev_err(dev, "Trigger soft reset failed: %d\n", err); + return err; + } + msleep(QT1050_RESET_TIME); + + /* Set pdata */ + err = qt1050_apply_fw_data(ts); + if (err) { + dev_err(dev, "Failed to set firmware data: %d\n", err); + return err; + } + + err = devm_request_threaded_irq(dev, client->irq, NULL, + qt1050_irq_threaded, IRQF_ONESHOT, + "qt1050", ts); + if (err) { + dev_err(&client->dev, "Failed to request irq: %d\n", err); + return err; + } + + /* Clear #CHANGE line */ + err = regmap_read(ts->regmap, QT1050_DET_STATUS, &status); + if (err) { + dev_err(dev, "Failed to clear #CHANGE line level: %d\n", err); + return err; + } + + /* Register the input device */ + err = input_register_device(ts->input); + if (err) { + dev_err(&client->dev, "Failed to register input device: %d\n", + err); + return err; + } + + return 0; +} + +static int __maybe_unused qt1050_suspend(struct device *dev) +{ + struct i2c_client *client = to_i2c_client(dev); + struct qt1050_priv *ts = i2c_get_clientdata(client); + + disable_irq(client->irq); + + /* + * Set measurement interval to 1s (125 x 8ms) if wakeup is allowed + * else turn off. The 1s interval seems to be a good compromise between + * low power and response time. + */ + return regmap_write(ts->regmap, QT1050_LPMODE, + device_may_wakeup(dev) ? 125 : 0); +} + +static int __maybe_unused qt1050_resume(struct device *dev) +{ + struct i2c_client *client = to_i2c_client(dev); + struct qt1050_priv *ts = i2c_get_clientdata(client); + + enable_irq(client->irq); + + /* Set measurement interval back to 16ms (2 x 8ms) */ + return regmap_write(ts->regmap, QT1050_LPMODE, 2); +} + +static SIMPLE_DEV_PM_OPS(qt1050_pm_ops, qt1050_suspend, qt1050_resume); + +static const struct of_device_id __maybe_unused qt1050_of_match[] = { + { .compatible = "microchip,qt1050", }, + { }, +}; +MODULE_DEVICE_TABLE(of, qt1050_of_match); + +static struct i2c_driver qt1050_driver = { + .driver = { + .name = "qt1050", + .of_match_table = of_match_ptr(qt1050_of_match), + .pm = &qt1050_pm_ops, + }, + .probe_new = qt1050_probe, +}; + +module_i2c_driver(qt1050_driver); + +MODULE_AUTHOR("Marco Felsch <kernel@pengutronix.de"); +MODULE_DESCRIPTION("Driver for AT42QT1050 QTouch sensor"); +MODULE_LICENSE("GPL v2"); diff --git a/drivers/input/keyboard/snvs_pwrkey.c b/drivers/input/keyboard/snvs_pwrkey.c index 4c67cf30a5d9..5342d8d45f81 100644 --- a/drivers/input/keyboard/snvs_pwrkey.c +++ b/drivers/input/keyboard/snvs_pwrkey.c @@ -15,6 +15,7 @@ #include <linux/of.h> #include <linux/of_address.h> #include <linux/platform_device.h> +#include <linux/pm_wakeirq.h> #include <linux/mfd/syscon.h> #include <linux/regmap.h> @@ -167,28 +168,9 @@ static int imx_snvs_pwrkey_probe(struct platform_device *pdev) } device_init_wakeup(&pdev->dev, pdata->wakeup); - - return 0; -} - -static int __maybe_unused imx_snvs_pwrkey_suspend(struct device *dev) -{ - struct platform_device *pdev = to_platform_device(dev); - struct pwrkey_drv_data *pdata = platform_get_drvdata(pdev); - - if (device_may_wakeup(&pdev->dev)) - enable_irq_wake(pdata->irq); - - return 0; -} - -static int __maybe_unused imx_snvs_pwrkey_resume(struct device *dev) -{ - struct platform_device *pdev = to_platform_device(dev); - struct pwrkey_drv_data *pdata = platform_get_drvdata(pdev); - - if (device_may_wakeup(&pdev->dev)) - disable_irq_wake(pdata->irq); + error = dev_pm_set_wake_irq(&pdev->dev, pdata->irq); + if (error) + dev_err(&pdev->dev, "irq wake enable failed.\n"); return 0; } @@ -199,13 +181,9 @@ static const struct of_device_id imx_snvs_pwrkey_ids[] = { }; MODULE_DEVICE_TABLE(of, imx_snvs_pwrkey_ids); -static SIMPLE_DEV_PM_OPS(imx_snvs_pwrkey_pm_ops, imx_snvs_pwrkey_suspend, - imx_snvs_pwrkey_resume); - static struct platform_driver imx_snvs_pwrkey_driver = { .driver = { .name = "snvs_pwrkey", - .pm = &imx_snvs_pwrkey_pm_ops, .of_match_table = imx_snvs_pwrkey_ids, }, .probe = imx_snvs_pwrkey_probe, diff --git a/drivers/input/keyboard/sun4i-lradc-keys.c b/drivers/input/keyboard/sun4i-lradc-keys.c index 57272df34cd5..df3eec72a9b2 100644 --- a/drivers/input/keyboard/sun4i-lradc-keys.c +++ b/drivers/input/keyboard/sun4i-lradc-keys.c @@ -46,6 +46,7 @@ #define CONTINUE_TIME_SEL(x) ((x) << 16) /* 4 bits */ #define KEY_MODE_SEL(x) ((x) << 12) /* 2 bits */ #define LEVELA_B_CNT(x) ((x) << 8) /* 4 bits */ +#define HOLD_KEY_EN(x) ((x) << 7) #define HOLD_EN(x) ((x) << 6) #define LEVELB_VOL(x) ((x) << 4) /* 2 bits */ #define SAMPLE_RATE(x) ((x) << 2) /* 2 bits */ @@ -63,6 +64,25 @@ #define CHAN0_KEYDOWN_IRQ BIT(1) #define CHAN0_DATA_IRQ BIT(0) +/* struct lradc_variant - Describe sun4i-a10-lradc-keys hardware variant + * @divisor_numerator: The numerator of lradc Vref internally divisor + * @divisor_denominator: The denominator of lradc Vref internally divisor + */ +struct lradc_variant { + u8 divisor_numerator; + u8 divisor_denominator; +}; + +static const struct lradc_variant lradc_variant_a10 = { + .divisor_numerator = 2, + .divisor_denominator = 3 +}; + +static const struct lradc_variant r_lradc_variant_a83t = { + .divisor_numerator = 3, + .divisor_denominator = 4 +}; + struct sun4i_lradc_keymap { u32 voltage; u32 keycode; @@ -74,6 +94,7 @@ struct sun4i_lradc_data { void __iomem *base; struct regulator *vref_supply; struct sun4i_lradc_keymap *chan0_map; + const struct lradc_variant *variant; u32 chan0_map_count; u32 chan0_keycode; u32 vref; @@ -128,9 +149,9 @@ static int sun4i_lradc_open(struct input_dev *dev) if (error) return error; - /* lradc Vref internally is divided by 2/3 */ - lradc->vref = regulator_get_voltage(lradc->vref_supply) * 2 / 3; - + lradc->vref = regulator_get_voltage(lradc->vref_supply) * + lradc->variant->divisor_numerator / + lradc->variant->divisor_denominator; /* * Set sample time to 4 ms / 250 Hz. Wait 2 * 4 ms for key to * stabilize on press, wait (1 + 1) * 4 ms for key release @@ -222,6 +243,12 @@ static int sun4i_lradc_probe(struct platform_device *pdev) if (error) return error; + lradc->variant = of_device_get_match_data(&pdev->dev); + if (!lradc->variant) { + dev_err(&pdev->dev, "Missing sun4i-a10-lradc-keys variant\n"); + return -EINVAL; + } + lradc->vref_supply = devm_regulator_get(dev, "vref"); if (IS_ERR(lradc->vref_supply)) return PTR_ERR(lradc->vref_supply); @@ -265,7 +292,10 @@ static int sun4i_lradc_probe(struct platform_device *pdev) } static const struct of_device_id sun4i_lradc_of_match[] = { - { .compatible = "allwinner,sun4i-a10-lradc-keys", }, + { .compatible = "allwinner,sun4i-a10-lradc-keys", + .data = &lradc_variant_a10 }, + { .compatible = "allwinner,sun8i-a83t-r-lradc", + .data = &r_lradc_variant_a83t }, { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, sun4i_lradc_of_match); diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig index e15ed1bb8558..6dfe9e2fe5b1 100644 --- a/drivers/input/misc/Kconfig +++ b/drivers/input/misc/Kconfig @@ -290,6 +290,18 @@ config INPUT_GPIO_DECODER To compile this driver as a module, choose M here: the module will be called gpio_decoder. +config INPUT_GPIO_VIBRA + tristate "GPIO vibrator support" + depends on GPIOLIB || COMPILE_TEST + select INPUT_FF_MEMLESS + help + Say Y here to get support for GPIO based vibrator devices. + + If unsure, say N. + + To compile this driver as a module, choose M here: the module will be + called gpio-vibra. + config INPUT_IXP4XX_BEEPER tristate "IXP4XX Beeper support" depends on ARCH_IXP4XX diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile index b936c5b1d4ac..f38ebbdb05e2 100644 --- a/drivers/input/misc/Makefile +++ b/drivers/input/misc/Makefile @@ -36,6 +36,7 @@ obj-$(CONFIG_INPUT_DRV2667_HAPTICS) += drv2667.o obj-$(CONFIG_INPUT_GP2A) += gp2ap002a00f.o obj-$(CONFIG_INPUT_GPIO_BEEPER) += gpio-beeper.o obj-$(CONFIG_INPUT_GPIO_DECODER) += gpio_decoder.o +obj-$(CONFIG_INPUT_GPIO_VIBRA) += gpio-vibra.o obj-$(CONFIG_INPUT_HISI_POWERKEY) += hisi_powerkey.o obj-$(CONFIG_HP_SDC_RTC) += hp_sdc_rtc.o obj-$(CONFIG_INPUT_IMS_PCU) += ims-pcu.o diff --git a/drivers/input/misc/gpio-vibra.c b/drivers/input/misc/gpio-vibra.c new file mode 100644 index 000000000000..f79f75595dd7 --- /dev/null +++ b/drivers/input/misc/gpio-vibra.c @@ -0,0 +1,207 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * GPIO vibrator driver + * + * Copyright (C) 2019 Luca Weiss <luca@z3ntu.xyz> + * + * Based on PWM vibrator driver: + * Copyright (C) 2017 Collabora Ltd. + * + * Based on previous work from: + * Copyright (C) 2012 Dmitry Torokhov <dmitry.torokhov@gmail.com> + * + * Based on PWM beeper driver: + * Copyright (C) 2010, Lars-Peter Clausen <lars@metafoo.de> + */ + +#include <linux/gpio/consumer.h> +#include <linux/input.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/of_device.h> +#include <linux/platform_device.h> +#include <linux/property.h> +#include <linux/regulator/consumer.h> +#include <linux/slab.h> + +struct gpio_vibrator { + struct input_dev *input; + struct gpio_desc *gpio; + struct regulator *vcc; + + struct work_struct play_work; + bool running; + bool vcc_on; +}; + +static int gpio_vibrator_start(struct gpio_vibrator *vibrator) +{ + struct device *pdev = vibrator->input->dev.parent; + int err; + + if (!vibrator->vcc_on) { + err = regulator_enable(vibrator->vcc); + if (err) { + dev_err(pdev, "failed to enable regulator: %d\n", err); + return err; + } + vibrator->vcc_on = true; + } + + gpiod_set_value_cansleep(vibrator->gpio, 1); + + return 0; +} + +static void gpio_vibrator_stop(struct gpio_vibrator *vibrator) +{ + gpiod_set_value_cansleep(vibrator->gpio, 0); + + if (vibrator->vcc_on) { + regulator_disable(vibrator->vcc); + vibrator->vcc_on = false; + } +} + +static void gpio_vibrator_play_work(struct work_struct *work) +{ + struct gpio_vibrator *vibrator = + container_of(work, struct gpio_vibrator, play_work); + + if (vibrator->running) + gpio_vibrator_start(vibrator); + else + gpio_vibrator_stop(vibrator); +} + +static int gpio_vibrator_play_effect(struct input_dev *dev, void *data, + struct ff_effect *effect) +{ + struct gpio_vibrator *vibrator = input_get_drvdata(dev); + int level; + + level = effect->u.rumble.strong_magnitude; + if (!level) + level = effect->u.rumble.weak_magnitude; + + vibrator->running = level; + schedule_work(&vibrator->play_work); + + return 0; +} + +static void gpio_vibrator_close(struct input_dev *input) +{ + struct gpio_vibrator *vibrator = input_get_drvdata(input); + + cancel_work_sync(&vibrator->play_work); + gpio_vibrator_stop(vibrator); + vibrator->running = false; +} + +static int gpio_vibrator_probe(struct platform_device *pdev) +{ + struct gpio_vibrator *vibrator; + int err; + + vibrator = devm_kzalloc(&pdev->dev, sizeof(*vibrator), GFP_KERNEL); + if (!vibrator) + return -ENOMEM; + + vibrator->input = devm_input_allocate_device(&pdev->dev); + if (!vibrator->input) + return -ENOMEM; + + vibrator->vcc = devm_regulator_get(&pdev->dev, "vcc"); + err = PTR_ERR_OR_ZERO(vibrator->vcc); + if (err) { + if (err != -EPROBE_DEFER) + dev_err(&pdev->dev, "Failed to request regulator: %d\n", + err); + return err; + } + + vibrator->gpio = devm_gpiod_get(&pdev->dev, "enable", GPIOD_OUT_LOW); + err = PTR_ERR_OR_ZERO(vibrator->gpio); + if (err) { + if (err != -EPROBE_DEFER) + dev_err(&pdev->dev, "Failed to request main gpio: %d\n", + err); + return err; + } + + INIT_WORK(&vibrator->play_work, gpio_vibrator_play_work); + + vibrator->input->name = "gpio-vibrator"; + vibrator->input->id.bustype = BUS_HOST; + vibrator->input->close = gpio_vibrator_close; + + input_set_drvdata(vibrator->input, vibrator); + input_set_capability(vibrator->input, EV_FF, FF_RUMBLE); + + err = input_ff_create_memless(vibrator->input, NULL, + gpio_vibrator_play_effect); + if (err) { + dev_err(&pdev->dev, "Couldn't create FF dev: %d\n", err); + return err; + } + + err = input_register_device(vibrator->input); + if (err) { + dev_err(&pdev->dev, "Couldn't register input dev: %d\n", err); + return err; + } + + platform_set_drvdata(pdev, vibrator); + + return 0; +} + +static int __maybe_unused gpio_vibrator_suspend(struct device *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + struct gpio_vibrator *vibrator = platform_get_drvdata(pdev); + + cancel_work_sync(&vibrator->play_work); + if (vibrator->running) + gpio_vibrator_stop(vibrator); + + return 0; +} + +static int __maybe_unused gpio_vibrator_resume(struct device *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + struct gpio_vibrator *vibrator = platform_get_drvdata(pdev); + + if (vibrator->running) + gpio_vibrator_start(vibrator); + + return 0; +} + +static SIMPLE_DEV_PM_OPS(gpio_vibrator_pm_ops, + gpio_vibrator_suspend, gpio_vibrator_resume); + +#ifdef CONFIG_OF +static const struct of_device_id gpio_vibra_dt_match_table[] = { + { .compatible = "gpio-vibrator" }, + {} +}; +MODULE_DEVICE_TABLE(of, gpio_vibra_dt_match_table); +#endif + +static struct platform_driver gpio_vibrator_driver = { + .probe = gpio_vibrator_probe, + .driver = { + .name = "gpio-vibrator", + .pm = &gpio_vibrator_pm_ops, + .of_match_table = of_match_ptr(gpio_vibra_dt_match_table), + }, +}; +module_platform_driver(gpio_vibrator_driver); + +MODULE_AUTHOR("Luca Weiss <luca@z3ntu.xy>"); +MODULE_DESCRIPTION("GPIO vibrator driver"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform:gpio-vibrator"); diff --git a/drivers/input/mouse/psmouse-base.c b/drivers/input/mouse/psmouse-base.c index d3ff1fc09af7..94f7ca5ad077 100644 --- a/drivers/input/mouse/psmouse-base.c +++ b/drivers/input/mouse/psmouse-base.c @@ -373,6 +373,8 @@ static irqreturn_t psmouse_interrupt(struct serio *serio, if (ps2_handle_response(&psmouse->ps2dev, data)) goto out; + pm_wakeup_event(&serio->dev, 0); + if (psmouse->state <= PSMOUSE_RESYNCING) goto out; diff --git a/drivers/input/rmi4/rmi_f54.c b/drivers/input/rmi4/rmi_f54.c index a6f515bcab22..516fea06ed59 100644 --- a/drivers/input/rmi4/rmi_f54.c +++ b/drivers/input/rmi4/rmi_f54.c @@ -456,25 +456,15 @@ static int rmi_f54_vidioc_fmt(struct file *file, void *priv, static int rmi_f54_vidioc_enum_fmt(struct file *file, void *priv, struct v4l2_fmtdesc *fmt) { + struct f54_data *f54 = video_drvdata(file); + if (fmt->type != V4L2_BUF_TYPE_VIDEO_CAPTURE) return -EINVAL; - switch (fmt->index) { - case 0: - fmt->pixelformat = V4L2_TCH_FMT_DELTA_TD16; - break; - - case 1: - fmt->pixelformat = V4L2_TCH_FMT_DELTA_TD08; - break; - - case 2: - fmt->pixelformat = V4L2_TCH_FMT_TU16; - break; - - default: + if (fmt->index) return -EINVAL; - } + + fmt->pixelformat = f54->format.pixelformat; return 0; } @@ -692,6 +682,7 @@ static int rmi_f54_probe(struct rmi_function *fn) return -ENOMEM; rmi_f54_create_input_map(f54); + rmi_f54_set_input(f54, 0); /* register video device */ strlcpy(f54->v4l2.name, F54_NAME, sizeof(f54->v4l2.name)); diff --git a/drivers/input/serio/Kconfig b/drivers/input/serio/Kconfig index c9c7224d5ae0..bfe436ccb046 100644 --- a/drivers/input/serio/Kconfig +++ b/drivers/input/serio/Kconfig @@ -254,6 +254,7 @@ config SERIO_APBPS2 config SERIO_OLPC_APSP tristate "OLPC AP-SP input support" + depends on ARCH_MMP || COMPILE_TEST help Say Y here if you want support for the keyboard and touchpad included in the OLPC XO-1.75 and XO-4 laptops. diff --git a/drivers/input/serio/hyperv-keyboard.c b/drivers/input/serio/hyperv-keyboard.c index a8b9be3e28db..7935e52b5435 100644 --- a/drivers/input/serio/hyperv-keyboard.c +++ b/drivers/input/serio/hyperv-keyboard.c @@ -440,5 +440,7 @@ static void __exit hv_kbd_exit(void) } MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("Microsoft Hyper-V Synthetic Keyboard Driver"); + module_init(hv_kbd_init); module_exit(hv_kbd_exit); diff --git a/drivers/input/serio/i8042.c b/drivers/input/serio/i8042.c index 95a78ccbd847..6462f1798fbb 100644 --- a/drivers/input/serio/i8042.c +++ b/drivers/input/serio/i8042.c @@ -573,9 +573,6 @@ static irqreturn_t i8042_interrupt(int irq, void *dev_id) port = &i8042_ports[port_no]; serio = port->exists ? port->serio : NULL; - if (irq && serio) - pm_wakeup_event(&serio->dev, 0); - filter_dbg(port->driver_bound, data, "<- i8042 (interrupt, %d, %d%s%s)\n", port_no, irq, dfl & SERIO_PARITY ? ", bad parity" : "", diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c index e6a07e68d1ff..22b8e05aa36c 100644 --- a/drivers/input/serio/libps2.c +++ b/drivers/input/serio/libps2.c @@ -409,6 +409,7 @@ bool ps2_handle_ack(struct ps2dev *ps2dev, u8 data) ps2dev->nak = PS2_RET_ERR; break; } + /* Fall through */ /* * Workaround for mice which don't ACK the Get ID command. diff --git a/drivers/input/touchscreen/Kconfig b/drivers/input/touchscreen/Kconfig index 7a4884ad198b..a2029c3235af 100644 --- a/drivers/input/touchscreen/Kconfig +++ b/drivers/input/touchscreen/Kconfig @@ -1312,4 +1312,14 @@ config TOUCHSCREEN_ROHM_BU21023 To compile this driver as a module, choose M here: the module will be called bu21023_ts. +config TOUCHSCREEN_IQS5XX + tristate "Azoteq IQS550/572/525 trackpad/touchscreen controller" + depends on I2C + help + Say Y to enable support for the Azoteq IQS550/572/525 + family of trackpad/touchscreen controllers. + + To compile this driver as a module, choose M here: the + module will be called iqs5xx. + endif diff --git a/drivers/input/touchscreen/Makefile b/drivers/input/touchscreen/Makefile index fcc7605fba8d..084a596a0c8b 100644 --- a/drivers/input/touchscreen/Makefile +++ b/drivers/input/touchscreen/Makefile @@ -110,3 +110,4 @@ obj-$(CONFIG_TOUCHSCREEN_ZFORCE) += zforce_ts.o obj-$(CONFIG_TOUCHSCREEN_COLIBRI_VF50) += colibri-vf50-ts.o obj-$(CONFIG_TOUCHSCREEN_ROHM_BU21023) += rohm_bu21023.o obj-$(CONFIG_TOUCHSCREEN_RASPBERRYPI_FW) += raspberrypi-ts.o +obj-$(CONFIG_TOUCHSCREEN_IQS5XX) += iqs5xx.o diff --git a/drivers/input/touchscreen/edt-ft5x06.c b/drivers/input/touchscreen/edt-ft5x06.c index 702bfda7ee77..c639ebce914c 100644 --- a/drivers/input/touchscreen/edt-ft5x06.c +++ b/drivers/input/touchscreen/edt-ft5x06.c @@ -1,20 +1,8 @@ +// SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2012 Simon Budig, <simon.budig@kernelconcepts.de> * Daniel Wagener <daniel.wagener@kernelconcepts.de> (M09 firmware support) * Lothar Waßmann <LW@KARO-electronics.de> (DT support) - * - * This software is licensed under the terms of the GNU General Public - * License version 2, as published by the Free Software Foundation, and - * may be copied, distributed, and modified under those terms. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public - * License along with this library; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ /* @@ -39,7 +27,6 @@ #include <linux/gpio/consumer.h> #include <linux/input/mt.h> #include <linux/input/touchscreen.h> -#include <linux/of_device.h> #define WORK_REGISTER_THRESHOLD 0x00 #define WORK_REGISTER_REPORT_RATE 0x08 @@ -1073,7 +1060,7 @@ static int edt_ft5x06_ts_probe(struct i2c_client *client, return -ENOMEM; } - chip_data = of_device_get_match_data(&client->dev); + chip_data = device_get_match_data(&client->dev); if (!chip_data) chip_data = (const struct edt_i2c_chip_data *)id->driver_data; if (!chip_data || !chip_data->max_support_points) { @@ -1254,7 +1241,6 @@ static const struct i2c_device_id edt_ft5x06_ts_id[] = { }; MODULE_DEVICE_TABLE(i2c, edt_ft5x06_ts_id); -#ifdef CONFIG_OF static const struct of_device_id edt_ft5x06_of_match[] = { { .compatible = "edt,edt-ft5206", .data = &edt_ft5x06_data }, { .compatible = "edt,edt-ft5306", .data = &edt_ft5x06_data }, @@ -1266,12 +1252,11 @@ static const struct of_device_id edt_ft5x06_of_match[] = { { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, edt_ft5x06_of_match); -#endif static struct i2c_driver edt_ft5x06_ts_driver = { .driver = { .name = "edt_ft5x06", - .of_match_table = of_match_ptr(edt_ft5x06_of_match), + .of_match_table = edt_ft5x06_of_match, .pm = &edt_ft5x06_ts_pm_ops, }, .id_table = edt_ft5x06_ts_id, @@ -1283,4 +1268,4 @@ module_i2c_driver(edt_ft5x06_ts_driver); MODULE_AUTHOR("Simon Budig <simon.budig@kernelconcepts.de>"); MODULE_DESCRIPTION("EDT FT5x06 I2C Touchscreen Driver"); -MODULE_LICENSE("GPL"); +MODULE_LICENSE("GPL v2"); diff --git a/drivers/input/touchscreen/goodix.c b/drivers/input/touchscreen/goodix.c index f57d82220a88..f7c1d168dd89 100644 --- a/drivers/input/touchscreen/goodix.c +++ b/drivers/input/touchscreen/goodix.c @@ -27,6 +27,7 @@ #include <linux/delay.h> #include <linux/irq.h> #include <linux/interrupt.h> +#include <linux/regulator/consumer.h> #include <linux/slab.h> #include <linux/acpi.h> #include <linux/of.h> @@ -47,6 +48,8 @@ struct goodix_ts_data { struct touchscreen_properties prop; unsigned int max_touch_num; unsigned int int_trigger_type; + struct regulator *avdd28; + struct regulator *vddio; struct gpio_desc *gpiod_int; struct gpio_desc *gpiod_rst; u16 id; @@ -216,6 +219,7 @@ static const struct goodix_chip_data *goodix_get_chip_data(u16 id) { switch (id) { case 1151: + case 5663: case 5688: return >1x_chip_data; @@ -532,6 +536,24 @@ static int goodix_get_gpio_config(struct goodix_ts_data *ts) return -EINVAL; dev = &ts->client->dev; + ts->avdd28 = devm_regulator_get(dev, "AVDD28"); + if (IS_ERR(ts->avdd28)) { + error = PTR_ERR(ts->avdd28); + if (error != -EPROBE_DEFER) + dev_err(dev, + "Failed to get AVDD28 regulator: %d\n", error); + return error; + } + + ts->vddio = devm_regulator_get(dev, "VDDIO"); + if (IS_ERR(ts->vddio)) { + error = PTR_ERR(ts->vddio); + if (error != -EPROBE_DEFER) + dev_err(dev, + "Failed to get VDDIO regulator: %d\n", error); + return error; + } + /* Get the interrupt GPIO pin number */ gpiod = devm_gpiod_get_optional(dev, GOODIX_GPIO_INT_NAME, GPIOD_IN); if (IS_ERR(gpiod)) { @@ -764,6 +786,14 @@ err_release_cfg: complete_all(&ts->firmware_loading_complete); } +static void goodix_disable_regulators(void *arg) +{ + struct goodix_ts_data *ts = arg; + + regulator_disable(ts->vddio); + regulator_disable(ts->avdd28); +} + static int goodix_ts_probe(struct i2c_client *client, const struct i2c_device_id *id) { @@ -789,6 +819,29 @@ static int goodix_ts_probe(struct i2c_client *client, if (error) return error; + /* power up the controller */ + error = regulator_enable(ts->avdd28); + if (error) { + dev_err(&client->dev, + "Failed to enable AVDD28 regulator: %d\n", + error); + return error; + } + + error = regulator_enable(ts->vddio); + if (error) { + dev_err(&client->dev, + "Failed to enable VDDIO regulator: %d\n", + error); + regulator_disable(ts->avdd28); + return error; + } + + error = devm_add_action_or_reset(&client->dev, + goodix_disable_regulators, ts); + if (error) + return error; + if (ts->gpiod_int && ts->gpiod_rst) { /* reset the controller */ error = goodix_reset(ts); @@ -945,6 +998,7 @@ MODULE_DEVICE_TABLE(acpi, goodix_acpi_match); #ifdef CONFIG_OF static const struct of_device_id goodix_of_match[] = { { .compatible = "goodix,gt1151" }, + { .compatible = "goodix,gt5663" }, { .compatible = "goodix,gt5688" }, { .compatible = "goodix,gt911" }, { .compatible = "goodix,gt9110" }, diff --git a/drivers/input/touchscreen/iqs5xx.c b/drivers/input/touchscreen/iqs5xx.c new file mode 100644 index 000000000000..b832fe062645 --- /dev/null +++ b/drivers/input/touchscreen/iqs5xx.c @@ -0,0 +1,1133 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Azoteq IQS550/572/525 Trackpad/Touchscreen Controller + * + * Copyright (C) 2018 + * Author: Jeff LaBundy <jeff@labundy.com> + * + * These devices require firmware exported from a PC-based configuration tool + * made available by the vendor. Firmware files may be pushed to the device's + * nonvolatile memory by writing the filename to the 'fw_file' sysfs control. + * + * Link to PC-based configuration tool and data sheet: http://www.azoteq.com/ + */ + +#include <linux/delay.h> +#include <linux/device.h> +#include <linux/err.h> +#include <linux/firmware.h> +#include <linux/gpio/consumer.h> +#include <linux/i2c.h> +#include <linux/input.h> +#include <linux/input/mt.h> +#include <linux/input/touchscreen.h> +#include <linux/interrupt.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/of_device.h> +#include <linux/slab.h> +#include <asm/unaligned.h> + +#define IQS5XX_FW_FILE_LEN 64 +#define IQS5XX_NUM_RETRIES 10 +#define IQS5XX_NUM_POINTS 256 +#define IQS5XX_NUM_CONTACTS 5 +#define IQS5XX_WR_BYTES_MAX 2 + +#define IQS5XX_PROD_NUM_IQS550 40 +#define IQS5XX_PROD_NUM_IQS572 58 +#define IQS5XX_PROD_NUM_IQS525 52 +#define IQS5XX_PROJ_NUM_A000 0 +#define IQS5XX_PROJ_NUM_B000 15 +#define IQS5XX_MAJOR_VER_MIN 2 + +#define IQS5XX_RESUME 0x00 +#define IQS5XX_SUSPEND 0x01 + +#define IQS5XX_SW_INPUT_EVENT 0x10 +#define IQS5XX_SETUP_COMPLETE 0x40 +#define IQS5XX_EVENT_MODE 0x01 +#define IQS5XX_TP_EVENT 0x04 + +#define IQS5XX_FLIP_X 0x01 +#define IQS5XX_FLIP_Y 0x02 +#define IQS5XX_SWITCH_XY_AXIS 0x04 + +#define IQS5XX_PROD_NUM 0x0000 +#define IQS5XX_ABS_X 0x0016 +#define IQS5XX_ABS_Y 0x0018 +#define IQS5XX_SYS_CTRL0 0x0431 +#define IQS5XX_SYS_CTRL1 0x0432 +#define IQS5XX_SYS_CFG0 0x058E +#define IQS5XX_SYS_CFG1 0x058F +#define IQS5XX_TOTAL_RX 0x063D +#define IQS5XX_TOTAL_TX 0x063E +#define IQS5XX_XY_CFG0 0x0669 +#define IQS5XX_X_RES 0x066E +#define IQS5XX_Y_RES 0x0670 +#define IQS5XX_CHKSM 0x83C0 +#define IQS5XX_APP 0x8400 +#define IQS5XX_CSTM 0xBE00 +#define IQS5XX_PMAP_END 0xBFFF +#define IQS5XX_END_COMM 0xEEEE + +#define IQS5XX_CHKSM_LEN (IQS5XX_APP - IQS5XX_CHKSM) +#define IQS5XX_APP_LEN (IQS5XX_CSTM - IQS5XX_APP) +#define IQS5XX_CSTM_LEN (IQS5XX_PMAP_END + 1 - IQS5XX_CSTM) +#define IQS5XX_PMAP_LEN (IQS5XX_PMAP_END + 1 - IQS5XX_CHKSM) + +#define IQS5XX_REC_HDR_LEN 4 +#define IQS5XX_REC_LEN_MAX 255 +#define IQS5XX_REC_TYPE_DATA 0x00 +#define IQS5XX_REC_TYPE_EOF 0x01 + +#define IQS5XX_BL_ADDR_MASK 0x40 +#define IQS5XX_BL_CMD_VER 0x00 +#define IQS5XX_BL_CMD_READ 0x01 +#define IQS5XX_BL_CMD_EXEC 0x02 +#define IQS5XX_BL_CMD_CRC 0x03 +#define IQS5XX_BL_BLK_LEN_MAX 64 +#define IQS5XX_BL_ID 0x0200 +#define IQS5XX_BL_STATUS_RESET 0x00 +#define IQS5XX_BL_STATUS_AVAIL 0xA5 +#define IQS5XX_BL_STATUS_NONE 0xEE +#define IQS5XX_BL_CRC_PASS 0x00 +#define IQS5XX_BL_CRC_FAIL 0x01 +#define IQS5XX_BL_ATTEMPTS 3 + +struct iqs5xx_private { + struct i2c_client *client; + struct input_dev *input; + struct gpio_desc *reset_gpio; + struct mutex lock; + u8 bl_status; +}; + +struct iqs5xx_dev_id_info { + __be16 prod_num; + __be16 proj_num; + u8 major_ver; + u8 minor_ver; + u8 bl_status; +} __packed; + +struct iqs5xx_ihex_rec { + char start; + char len[2]; + char addr[4]; + char type[2]; + char data[2]; +} __packed; + +struct iqs5xx_touch_data { + __be16 abs_x; + __be16 abs_y; + __be16 strength; + u8 area; +} __packed; + +static int iqs5xx_read_burst(struct i2c_client *client, + u16 reg, void *val, u16 len) +{ + __be16 reg_buf = cpu_to_be16(reg); + int ret, i; + struct i2c_msg msg[] = { + { + .addr = client->addr, + .flags = 0, + .len = sizeof(reg_buf), + .buf = (u8 *)®_buf, + }, + { + .addr = client->addr, + .flags = I2C_M_RD, + .len = len, + .buf = (u8 *)val, + }, + }; + + /* + * The first addressing attempt outside of a communication window fails + * and must be retried, after which the device clock stretches until it + * is available. + */ + for (i = 0; i < IQS5XX_NUM_RETRIES; i++) { + ret = i2c_transfer(client->adapter, msg, ARRAY_SIZE(msg)); + if (ret == ARRAY_SIZE(msg)) + return 0; + + usleep_range(200, 300); + } + + if (ret >= 0) + ret = -EIO; + + dev_err(&client->dev, "Failed to read from address 0x%04X: %d\n", + reg, ret); + + return ret; +} + +static int iqs5xx_read_word(struct i2c_client *client, u16 reg, u16 *val) +{ + __be16 val_buf; + int error; + + error = iqs5xx_read_burst(client, reg, &val_buf, sizeof(val_buf)); + if (error) + return error; + + *val = be16_to_cpu(val_buf); + + return 0; +} + +static int iqs5xx_read_byte(struct i2c_client *client, u16 reg, u8 *val) +{ + return iqs5xx_read_burst(client, reg, val, sizeof(*val)); +} + +static int iqs5xx_write_burst(struct i2c_client *client, + u16 reg, const void *val, u16 len) +{ + int ret, i; + u16 mlen = sizeof(reg) + len; + u8 mbuf[sizeof(reg) + IQS5XX_WR_BYTES_MAX]; + + if (len > IQS5XX_WR_BYTES_MAX) + return -EINVAL; + + put_unaligned_be16(reg, mbuf); + memcpy(mbuf + sizeof(reg), val, len); + + /* + * The first addressing attempt outside of a communication window fails + * and must be retried, after which the device clock stretches until it + * is available. + */ + for (i = 0; i < IQS5XX_NUM_RETRIES; i++) { + ret = i2c_master_send(client, mbuf, mlen); + if (ret == mlen) + return 0; + + usleep_range(200, 300); + } + + if (ret >= 0) + ret = -EIO; + + dev_err(&client->dev, "Failed to write to address 0x%04X: %d\n", + reg, ret); + + return ret; +} + +static int iqs5xx_write_word(struct i2c_client *client, u16 reg, u16 val) +{ + __be16 val_buf = cpu_to_be16(val); + + return iqs5xx_write_burst(client, reg, &val_buf, sizeof(val_buf)); +} + +static int iqs5xx_write_byte(struct i2c_client *client, u16 reg, u8 val) +{ + return iqs5xx_write_burst(client, reg, &val, sizeof(val)); +} + +static void iqs5xx_reset(struct i2c_client *client) +{ + struct iqs5xx_private *iqs5xx = i2c_get_clientdata(client); + + gpiod_set_value_cansleep(iqs5xx->reset_gpio, 1); + usleep_range(200, 300); + + gpiod_set_value_cansleep(iqs5xx->reset_gpio, 0); +} + +static int iqs5xx_bl_cmd(struct i2c_client *client, u8 bl_cmd, u16 bl_addr) +{ + struct i2c_msg msg; + int ret; + u8 mbuf[sizeof(bl_cmd) + sizeof(bl_addr)]; + + msg.addr = client->addr ^ IQS5XX_BL_ADDR_MASK; + msg.flags = 0; + msg.len = sizeof(bl_cmd); + msg.buf = mbuf; + + *mbuf = bl_cmd; + + switch (bl_cmd) { + case IQS5XX_BL_CMD_VER: + case IQS5XX_BL_CMD_CRC: + case IQS5XX_BL_CMD_EXEC: + break; + case IQS5XX_BL_CMD_READ: + msg.len += sizeof(bl_addr); + put_unaligned_be16(bl_addr, mbuf + sizeof(bl_cmd)); + break; + default: + return -EINVAL; + } + + ret = i2c_transfer(client->adapter, &msg, 1); + if (ret != 1) + goto msg_fail; + + switch (bl_cmd) { + case IQS5XX_BL_CMD_VER: + msg.len = sizeof(u16); + break; + case IQS5XX_BL_CMD_CRC: + msg.len = sizeof(u8); + /* + * This delay saves the bus controller the trouble of having to + * tolerate a relatively long clock-stretching period while the + * CRC is calculated. + */ + msleep(50); + break; + case IQS5XX_BL_CMD_EXEC: + usleep_range(10000, 10100); + /* fall through */ + default: + return 0; + } + + msg.flags = I2C_M_RD; + + ret = i2c_transfer(client->adapter, &msg, 1); + if (ret != 1) + goto msg_fail; + + if (bl_cmd == IQS5XX_BL_CMD_VER && + get_unaligned_be16(mbuf) != IQS5XX_BL_ID) { + dev_err(&client->dev, "Unrecognized bootloader ID: 0x%04X\n", + get_unaligned_be16(mbuf)); + return -EINVAL; + } + + if (bl_cmd == IQS5XX_BL_CMD_CRC && *mbuf != IQS5XX_BL_CRC_PASS) { + dev_err(&client->dev, "Bootloader CRC failed\n"); + return -EIO; + } + + return 0; + +msg_fail: + if (ret >= 0) + ret = -EIO; + + if (bl_cmd != IQS5XX_BL_CMD_VER) + dev_err(&client->dev, + "Unsuccessful bootloader command 0x%02X: %d\n", + bl_cmd, ret); + + return ret; +} + +static int iqs5xx_bl_open(struct i2c_client *client) +{ + int error, i, j; + + /* + * The device opens a bootloader polling window for 2 ms following the + * release of reset. If the host cannot establish communication during + * this time frame, it must cycle reset again. + */ + for (i = 0; i < IQS5XX_BL_ATTEMPTS; i++) { + iqs5xx_reset(client); + + for (j = 0; j < IQS5XX_NUM_RETRIES; j++) { + error = iqs5xx_bl_cmd(client, IQS5XX_BL_CMD_VER, 0); + if (!error || error == -EINVAL) + return error; + } + } + + dev_err(&client->dev, "Failed to open bootloader: %d\n", error); + + return error; +} + +static int iqs5xx_bl_write(struct i2c_client *client, + u16 bl_addr, u8 *pmap_data, u16 pmap_len) +{ + struct i2c_msg msg; + int ret, i; + u8 mbuf[sizeof(bl_addr) + IQS5XX_BL_BLK_LEN_MAX]; + + if (pmap_len % IQS5XX_BL_BLK_LEN_MAX) + return -EINVAL; + + msg.addr = client->addr ^ IQS5XX_BL_ADDR_MASK; + msg.flags = 0; + msg.len = sizeof(mbuf); + msg.buf = mbuf; + + for (i = 0; i < pmap_len; i += IQS5XX_BL_BLK_LEN_MAX) { + put_unaligned_be16(bl_addr + i, mbuf); + memcpy(mbuf + sizeof(bl_addr), pmap_data + i, + sizeof(mbuf) - sizeof(bl_addr)); + + ret = i2c_transfer(client->adapter, &msg, 1); + if (ret != 1) + goto msg_fail; + + usleep_range(10000, 10100); + } + + return 0; + +msg_fail: + if (ret >= 0) + ret = -EIO; + + dev_err(&client->dev, "Failed to write block at address 0x%04X: %d\n", + bl_addr + i, ret); + + return ret; +} + +static int iqs5xx_bl_verify(struct i2c_client *client, + u16 bl_addr, u8 *pmap_data, u16 pmap_len) +{ + struct i2c_msg msg; + int ret, i; + u8 bl_data[IQS5XX_BL_BLK_LEN_MAX]; + + if (pmap_len % IQS5XX_BL_BLK_LEN_MAX) + return -EINVAL; + + msg.addr = client->addr ^ IQS5XX_BL_ADDR_MASK; + msg.flags = I2C_M_RD; + msg.len = sizeof(bl_data); + msg.buf = bl_data; + + for (i = 0; i < pmap_len; i += IQS5XX_BL_BLK_LEN_MAX) { + ret = iqs5xx_bl_cmd(client, IQS5XX_BL_CMD_READ, bl_addr + i); + if (ret) + return ret; + + ret = i2c_transfer(client->adapter, &msg, 1); + if (ret != 1) + goto msg_fail; + + if (memcmp(bl_data, pmap_data + i, sizeof(bl_data))) { + dev_err(&client->dev, + "Failed to verify block at address 0x%04X\n", + bl_addr + i); + return -EIO; + } + } + + return 0; + +msg_fail: + if (ret >= 0) + ret = -EIO; + + dev_err(&client->dev, "Failed to read block at address 0x%04X: %d\n", + bl_addr + i, ret); + + return ret; +} + +static int iqs5xx_set_state(struct i2c_client *client, u8 state) +{ + struct iqs5xx_private *iqs5xx = i2c_get_clientdata(client); + int error1, error2; + + if (iqs5xx->bl_status == IQS5XX_BL_STATUS_RESET) + return 0; + + mutex_lock(&iqs5xx->lock); + + /* + * Addressing the device outside of a communication window prompts it + * to assert the RDY output, so disable the interrupt line to prevent + * the handler from servicing a false interrupt. + */ + disable_irq(client->irq); + + error1 = iqs5xx_write_byte(client, IQS5XX_SYS_CTRL1, state); + error2 = iqs5xx_write_byte(client, IQS5XX_END_COMM, 0); + + usleep_range(50, 100); + enable_irq(client->irq); + + mutex_unlock(&iqs5xx->lock); + + if (error1) + return error1; + + return error2; +} + +static int iqs5xx_open(struct input_dev *input) +{ + struct iqs5xx_private *iqs5xx = input_get_drvdata(input); + + return iqs5xx_set_state(iqs5xx->client, IQS5XX_RESUME); +} + +static void iqs5xx_close(struct input_dev *input) +{ + struct iqs5xx_private *iqs5xx = input_get_drvdata(input); + + iqs5xx_set_state(iqs5xx->client, IQS5XX_SUSPEND); +} + +static int iqs5xx_axis_init(struct i2c_client *client) +{ + struct iqs5xx_private *iqs5xx = i2c_get_clientdata(client); + struct touchscreen_properties prop; + struct input_dev *input; + int error; + u16 max_x, max_x_hw; + u16 max_y, max_y_hw; + u8 val; + + if (!iqs5xx->input) { + input = devm_input_allocate_device(&client->dev); + if (!input) + return -ENOMEM; + + input->name = client->name; + input->id.bustype = BUS_I2C; + input->open = iqs5xx_open; + input->close = iqs5xx_close; + + input_set_capability(input, EV_ABS, ABS_MT_POSITION_X); + input_set_capability(input, EV_ABS, ABS_MT_POSITION_Y); + input_set_capability(input, EV_ABS, ABS_MT_PRESSURE); + + error = input_mt_init_slots(input, + IQS5XX_NUM_CONTACTS, INPUT_MT_DIRECT); + if (error) { + dev_err(&client->dev, + "Failed to initialize slots: %d\n", error); + return error; + } + + input_set_drvdata(input, iqs5xx); + iqs5xx->input = input; + } + + touchscreen_parse_properties(iqs5xx->input, true, &prop); + + error = iqs5xx_read_byte(client, IQS5XX_TOTAL_RX, &val); + if (error) + return error; + max_x_hw = (val - 1) * IQS5XX_NUM_POINTS; + + error = iqs5xx_read_byte(client, IQS5XX_TOTAL_TX, &val); + if (error) + return error; + max_y_hw = (val - 1) * IQS5XX_NUM_POINTS; + + error = iqs5xx_read_byte(client, IQS5XX_XY_CFG0, &val); + if (error) + return error; + + if (val & IQS5XX_SWITCH_XY_AXIS) + swap(max_x_hw, max_y_hw); + + if (prop.swap_x_y) + val ^= IQS5XX_SWITCH_XY_AXIS; + + if (prop.invert_x) + val ^= prop.swap_x_y ? IQS5XX_FLIP_Y : IQS5XX_FLIP_X; + + if (prop.invert_y) + val ^= prop.swap_x_y ? IQS5XX_FLIP_X : IQS5XX_FLIP_Y; + + error = iqs5xx_write_byte(client, IQS5XX_XY_CFG0, val); + if (error) + return error; + + if (prop.max_x > max_x_hw) { + dev_err(&client->dev, "Invalid maximum x-coordinate: %u > %u\n", + prop.max_x, max_x_hw); + return -EINVAL; + } else if (prop.max_x == 0) { + error = iqs5xx_read_word(client, IQS5XX_X_RES, &max_x); + if (error) + return error; + + input_abs_set_max(iqs5xx->input, + prop.swap_x_y ? ABS_MT_POSITION_Y : + ABS_MT_POSITION_X, + max_x); + } else { + max_x = (u16)prop.max_x; + } + + if (prop.max_y > max_y_hw) { + dev_err(&client->dev, "Invalid maximum y-coordinate: %u > %u\n", + prop.max_y, max_y_hw); + return -EINVAL; + } else if (prop.max_y == 0) { + error = iqs5xx_read_word(client, IQS5XX_Y_RES, &max_y); + if (error) + return error; + + input_abs_set_max(iqs5xx->input, + prop.swap_x_y ? ABS_MT_POSITION_X : + ABS_MT_POSITION_Y, + max_y); + } else { + max_y = (u16)prop.max_y; + } + + /* + * Write horizontal and vertical resolution to the device in case its + * original defaults were overridden or swapped as per the properties + * specified in the device tree. + */ + error = iqs5xx_write_word(client, + prop.swap_x_y ? IQS5XX_Y_RES : IQS5XX_X_RES, + max_x); + if (error) + return error; + + return iqs5xx_write_word(client, + prop.swap_x_y ? IQS5XX_X_RES : IQS5XX_Y_RES, + max_y); +} + +static int iqs5xx_dev_init(struct i2c_client *client) +{ + struct iqs5xx_private *iqs5xx = i2c_get_clientdata(client); + struct iqs5xx_dev_id_info *dev_id_info; + int error; + u8 val; + u8 buf[sizeof(*dev_id_info) + 1]; + + error = iqs5xx_read_burst(client, IQS5XX_PROD_NUM, + &buf[1], sizeof(*dev_id_info)); + if (error) + return iqs5xx_bl_open(client); + + /* + * A000 and B000 devices use 8-bit and 16-bit addressing, respectively. + * Querying an A000 device's version information with 16-bit addressing + * gives the appearance that the data is shifted by one byte; a nonzero + * leading array element suggests this could be the case (in which case + * the missing zero is prepended). + */ + buf[0] = 0; + dev_id_info = (struct iqs5xx_dev_id_info *)&buf[(buf[1] > 0) ? 0 : 1]; + + switch (be16_to_cpu(dev_id_info->prod_num)) { + case IQS5XX_PROD_NUM_IQS550: + case IQS5XX_PROD_NUM_IQS572: + case IQS5XX_PROD_NUM_IQS525: + break; + default: + dev_err(&client->dev, "Unrecognized product number: %u\n", + be16_to_cpu(dev_id_info->prod_num)); + return -EINVAL; + } + + switch (be16_to_cpu(dev_id_info->proj_num)) { + case IQS5XX_PROJ_NUM_A000: + dev_err(&client->dev, "Unsupported project number: %u\n", + be16_to_cpu(dev_id_info->proj_num)); + return iqs5xx_bl_open(client); + case IQS5XX_PROJ_NUM_B000: + break; + default: + dev_err(&client->dev, "Unrecognized project number: %u\n", + be16_to_cpu(dev_id_info->proj_num)); + return -EINVAL; + } + + if (dev_id_info->major_ver < IQS5XX_MAJOR_VER_MIN) { + dev_err(&client->dev, "Unsupported major version: %u\n", + dev_id_info->major_ver); + return iqs5xx_bl_open(client); + } + + switch (dev_id_info->bl_status) { + case IQS5XX_BL_STATUS_AVAIL: + case IQS5XX_BL_STATUS_NONE: + break; + default: + dev_err(&client->dev, + "Unrecognized bootloader status: 0x%02X\n", + dev_id_info->bl_status); + return -EINVAL; + } + + error = iqs5xx_axis_init(client); + if (error) + return error; + + error = iqs5xx_read_byte(client, IQS5XX_SYS_CFG0, &val); + if (error) + return error; + + val |= IQS5XX_SETUP_COMPLETE; + val &= ~IQS5XX_SW_INPUT_EVENT; + error = iqs5xx_write_byte(client, IQS5XX_SYS_CFG0, val); + if (error) + return error; + + val = IQS5XX_TP_EVENT | IQS5XX_EVENT_MODE; + error = iqs5xx_write_byte(client, IQS5XX_SYS_CFG1, val); + if (error) + return error; + + error = iqs5xx_write_byte(client, IQS5XX_END_COMM, 0); + if (error) + return error; + + iqs5xx->bl_status = dev_id_info->bl_status; + + /* + * Closure of the first communication window that appears following the + * release of reset appears to kick off an initialization period during + * which further communication is met with clock stretching. The return + * from this function is delayed so that further communication attempts + * avoid this period. + */ + msleep(100); + + return 0; +} + +static irqreturn_t iqs5xx_irq(int irq, void *data) +{ + struct iqs5xx_private *iqs5xx = data; + struct iqs5xx_touch_data touch_data[IQS5XX_NUM_CONTACTS]; + struct i2c_client *client = iqs5xx->client; + struct input_dev *input = iqs5xx->input; + int error, i; + + /* + * This check is purely a precaution, as the device does not assert the + * RDY output during bootloader mode. If the device operates outside of + * bootloader mode, the input device is guaranteed to be allocated. + */ + if (iqs5xx->bl_status == IQS5XX_BL_STATUS_RESET) + return IRQ_NONE; + + error = iqs5xx_read_burst(client, IQS5XX_ABS_X, + touch_data, sizeof(touch_data)); + if (error) + return IRQ_NONE; + + for (i = 0; i < ARRAY_SIZE(touch_data); i++) { + u16 pressure = be16_to_cpu(touch_data[i].strength); + + input_mt_slot(input, i); + if (input_mt_report_slot_state(input, MT_TOOL_FINGER, + pressure != 0)) { + input_report_abs(input, ABS_MT_POSITION_X, + be16_to_cpu(touch_data[i].abs_x)); + input_report_abs(input, ABS_MT_POSITION_Y, + be16_to_cpu(touch_data[i].abs_y)); + input_report_abs(input, ABS_MT_PRESSURE, pressure); + } + } + + input_mt_sync_frame(input); + input_sync(input); + + error = iqs5xx_write_byte(client, IQS5XX_END_COMM, 0); + if (error) + return IRQ_NONE; + + /* + * Once the communication window is closed, a small delay is added to + * ensure the device's RDY output has been deasserted by the time the + * interrupt handler returns. + */ + usleep_range(50, 100); + + return IRQ_HANDLED; +} + +static int iqs5xx_fw_file_parse(struct i2c_client *client, + const char *fw_file, u8 *pmap) +{ + const struct firmware *fw; + struct iqs5xx_ihex_rec *rec; + size_t pos = 0; + int error, i; + u16 rec_num = 1; + u16 rec_addr; + u8 rec_len, rec_type, rec_chksm, chksm; + u8 rec_hdr[IQS5XX_REC_HDR_LEN]; + u8 rec_data[IQS5XX_REC_LEN_MAX]; + + /* + * Firmware exported from the vendor's configuration tool deviates from + * standard ihex as follows: (1) the checksum for records corresponding + * to user-exported settings is not recalculated, and (2) an address of + * 0xFFFF is used for the EOF record. + * + * Because the ihex2fw tool tolerates neither (1) nor (2), the slightly + * nonstandard ihex firmware is parsed directly by the driver. + */ + error = request_firmware(&fw, fw_file, &client->dev); + if (error) { + dev_err(&client->dev, "Failed to request firmware %s: %d\n", + fw_file, error); + return error; + } + + do { + if (pos + sizeof(*rec) > fw->size) { + dev_err(&client->dev, "Insufficient firmware size\n"); + error = -EINVAL; + break; + } + rec = (struct iqs5xx_ihex_rec *)(fw->data + pos); + pos += sizeof(*rec); + + if (rec->start != ':') { + dev_err(&client->dev, "Invalid start at record %u\n", + rec_num); + error = -EINVAL; + break; + } + + error = hex2bin(rec_hdr, rec->len, sizeof(rec_hdr)); + if (error) { + dev_err(&client->dev, "Invalid header at record %u\n", + rec_num); + break; + } + + rec_len = *rec_hdr; + rec_addr = get_unaligned_be16(rec_hdr + sizeof(rec_len)); + rec_type = *(rec_hdr + sizeof(rec_len) + sizeof(rec_addr)); + + if (pos + rec_len * 2 > fw->size) { + dev_err(&client->dev, "Insufficient firmware size\n"); + error = -EINVAL; + break; + } + pos += (rec_len * 2); + + error = hex2bin(rec_data, rec->data, rec_len); + if (error) { + dev_err(&client->dev, "Invalid data at record %u\n", + rec_num); + break; + } + + error = hex2bin(&rec_chksm, + rec->data + rec_len * 2, sizeof(rec_chksm)); + if (error) { + dev_err(&client->dev, "Invalid checksum at record %u\n", + rec_num); + break; + } + + chksm = 0; + for (i = 0; i < sizeof(rec_hdr); i++) + chksm += rec_hdr[i]; + for (i = 0; i < rec_len; i++) + chksm += rec_data[i]; + chksm = ~chksm + 1; + + if (chksm != rec_chksm && rec_addr < IQS5XX_CSTM) { + dev_err(&client->dev, + "Incorrect checksum at record %u\n", + rec_num); + error = -EINVAL; + break; + } + + switch (rec_type) { + case IQS5XX_REC_TYPE_DATA: + if (rec_addr < IQS5XX_CHKSM || + rec_addr > IQS5XX_PMAP_END) { + dev_err(&client->dev, + "Invalid address at record %u\n", + rec_num); + error = -EINVAL; + } else { + memcpy(pmap + rec_addr - IQS5XX_CHKSM, + rec_data, rec_len); + } + break; + case IQS5XX_REC_TYPE_EOF: + break; + default: + dev_err(&client->dev, "Invalid type at record %u\n", + rec_num); + error = -EINVAL; + } + + if (error) + break; + + rec_num++; + while (pos < fw->size) { + if (*(fw->data + pos) == ':') + break; + pos++; + } + } while (rec_type != IQS5XX_REC_TYPE_EOF); + + release_firmware(fw); + + return error; +} + +static int iqs5xx_fw_file_write(struct i2c_client *client, const char *fw_file) +{ + struct iqs5xx_private *iqs5xx = i2c_get_clientdata(client); + int error; + u8 *pmap; + + if (iqs5xx->bl_status == IQS5XX_BL_STATUS_NONE) + return -EPERM; + + pmap = kzalloc(IQS5XX_PMAP_LEN, GFP_KERNEL); + if (!pmap) + return -ENOMEM; + + error = iqs5xx_fw_file_parse(client, fw_file, pmap); + if (error) + goto err_kfree; + + mutex_lock(&iqs5xx->lock); + + /* + * Disable the interrupt line in case the first attempt(s) to enter the + * bootloader don't happen quickly enough, in which case the device may + * assert the RDY output until the next attempt. + */ + disable_irq(client->irq); + + iqs5xx->bl_status = IQS5XX_BL_STATUS_RESET; + + error = iqs5xx_bl_cmd(client, IQS5XX_BL_CMD_VER, 0); + if (error) { + error = iqs5xx_bl_open(client); + if (error) + goto err_reset; + } + + error = iqs5xx_bl_write(client, IQS5XX_CHKSM, pmap, IQS5XX_PMAP_LEN); + if (error) + goto err_reset; + + error = iqs5xx_bl_cmd(client, IQS5XX_BL_CMD_CRC, 0); + if (error) + goto err_reset; + + error = iqs5xx_bl_verify(client, IQS5XX_CSTM, + pmap + IQS5XX_CHKSM_LEN + IQS5XX_APP_LEN, + IQS5XX_CSTM_LEN); + if (error) + goto err_reset; + + error = iqs5xx_bl_cmd(client, IQS5XX_BL_CMD_EXEC, 0); + +err_reset: + if (error) { + iqs5xx_reset(client); + usleep_range(10000, 10100); + } + + error = iqs5xx_dev_init(client); + if (!error && iqs5xx->bl_status == IQS5XX_BL_STATUS_RESET) + error = -EINVAL; + + enable_irq(client->irq); + + mutex_unlock(&iqs5xx->lock); + +err_kfree: + kfree(pmap); + + return error; +} + +static ssize_t fw_file_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct iqs5xx_private *iqs5xx = dev_get_drvdata(dev); + struct i2c_client *client = iqs5xx->client; + size_t len = count; + bool input_reg = !iqs5xx->input; + char fw_file[IQS5XX_FW_FILE_LEN + 1]; + int error; + + if (!len) + return -EINVAL; + + if (buf[len - 1] == '\n') + len--; + + if (len > IQS5XX_FW_FILE_LEN) + return -ENAMETOOLONG; + + memcpy(fw_file, buf, len); + fw_file[len] = '\0'; + + error = iqs5xx_fw_file_write(client, fw_file); + if (error) + return error; + + /* + * If the input device was not allocated already, it is guaranteed to + * be allocated by this point and can finally be registered. + */ + if (input_reg) { + error = input_register_device(iqs5xx->input); + if (error) { + dev_err(&client->dev, + "Failed to register device: %d\n", + error); + return error; + } + } + + return count; +} + +static DEVICE_ATTR_WO(fw_file); + +static struct attribute *iqs5xx_attrs[] = { + &dev_attr_fw_file.attr, + NULL, +}; + +static const struct attribute_group iqs5xx_attr_group = { + .attrs = iqs5xx_attrs, +}; + +static int __maybe_unused iqs5xx_suspend(struct device *dev) +{ + struct iqs5xx_private *iqs5xx = dev_get_drvdata(dev); + struct input_dev *input = iqs5xx->input; + int error = 0; + + if (!input) + return error; + + mutex_lock(&input->mutex); + + if (input->users) + error = iqs5xx_set_state(iqs5xx->client, IQS5XX_SUSPEND); + + mutex_unlock(&input->mutex); + + return error; +} + +static int __maybe_unused iqs5xx_resume(struct device *dev) +{ + struct iqs5xx_private *iqs5xx = dev_get_drvdata(dev); + struct input_dev *input = iqs5xx->input; + int error = 0; + + if (!input) + return error; + + mutex_lock(&input->mutex); + + if (input->users) + error = iqs5xx_set_state(iqs5xx->client, IQS5XX_RESUME); + + mutex_unlock(&input->mutex); + + return error; +} + +static SIMPLE_DEV_PM_OPS(iqs5xx_pm, iqs5xx_suspend, iqs5xx_resume); + +static int iqs5xx_probe(struct i2c_client *client, + const struct i2c_device_id *id) +{ + struct iqs5xx_private *iqs5xx; + int error; + + iqs5xx = devm_kzalloc(&client->dev, sizeof(*iqs5xx), GFP_KERNEL); + if (!iqs5xx) + return -ENOMEM; + + dev_set_drvdata(&client->dev, iqs5xx); + + i2c_set_clientdata(client, iqs5xx); + iqs5xx->client = client; + + iqs5xx->reset_gpio = devm_gpiod_get(&client->dev, + "reset", GPIOD_OUT_LOW); + if (IS_ERR(iqs5xx->reset_gpio)) { + error = PTR_ERR(iqs5xx->reset_gpio); + dev_err(&client->dev, "Failed to request GPIO: %d\n", error); + return error; + } + + mutex_init(&iqs5xx->lock); + + iqs5xx_reset(client); + usleep_range(10000, 10100); + + error = iqs5xx_dev_init(client); + if (error) + return error; + + error = devm_request_threaded_irq(&client->dev, client->irq, + NULL, iqs5xx_irq, IRQF_ONESHOT, + client->name, iqs5xx); + if (error) { + dev_err(&client->dev, "Failed to request IRQ: %d\n", error); + return error; + } + + error = devm_device_add_group(&client->dev, &iqs5xx_attr_group); + if (error) { + dev_err(&client->dev, "Failed to add attributes: %d\n", error); + return error; + } + + if (iqs5xx->input) { + error = input_register_device(iqs5xx->input); + if (error) + dev_err(&client->dev, + "Failed to register device: %d\n", + error); + } + + return error; +} + +static const struct i2c_device_id iqs5xx_id[] = { + { "iqs550", 0 }, + { "iqs572", 1 }, + { "iqs525", 2 }, + { } +}; +MODULE_DEVICE_TABLE(i2c, iqs5xx_id); + +static const struct of_device_id iqs5xx_of_match[] = { + { .compatible = "azoteq,iqs550" }, + { .compatible = "azoteq,iqs572" }, + { .compatible = "azoteq,iqs525" }, + { } +}; +MODULE_DEVICE_TABLE(of, iqs5xx_of_match); + +static struct i2c_driver iqs5xx_i2c_driver = { + .driver = { + .name = "iqs5xx", + .of_match_table = iqs5xx_of_match, + .pm = &iqs5xx_pm, + }, + .id_table = iqs5xx_id, + .probe = iqs5xx_probe, +}; +module_i2c_driver(iqs5xx_i2c_driver); + +MODULE_AUTHOR("Jeff LaBundy <jeff@labundy.com>"); +MODULE_DESCRIPTION("Azoteq IQS550/572/525 Trackpad/Touchscreen Controller"); +MODULE_LICENSE("GPL"); diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index da1fc17295d9..b996967af8d9 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -1098,13 +1098,6 @@ static int bond_option_arp_validate_set(struct bonding *bond, { netdev_dbg(bond->dev, "Setting arp_validate to %s (%llu)\n", newval->string, newval->value); - - if (bond->dev->flags & IFF_UP) { - if (!newval->value) - bond->recv_probe = NULL; - else if (bond->params.arp_interval) - bond->recv_probe = bond_arp_rcv; - } bond->params.arp_validate = newval->value; return 0; diff --git a/drivers/net/ethernet/allwinner/sun4i-emac.c b/drivers/net/ethernet/allwinner/sun4i-emac.c index 37ebd890ef51..9e06dff619c3 100644 --- a/drivers/net/ethernet/allwinner/sun4i-emac.c +++ b/drivers/net/ethernet/allwinner/sun4i-emac.c @@ -871,7 +871,7 @@ static int emac_probe(struct platform_device *pdev) /* Read MAC-address from DT */ mac_addr = of_get_mac_address(np); if (!IS_ERR(mac_addr)) - memcpy(ndev->dev_addr, mac_addr, ETH_ALEN); + ether_addr_copy(ndev->dev_addr, mac_addr); /* Check if the MAC address is valid, if not get a random one */ if (!is_valid_ether_addr(ndev->dev_addr)) { diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c index 7f89ad5c336d..13a1d99b29c6 100644 --- a/drivers/net/ethernet/arc/emac_main.c +++ b/drivers/net/ethernet/arc/emac_main.c @@ -961,7 +961,7 @@ int arc_emac_probe(struct net_device *ndev, int interface) mac_addr = of_get_mac_address(dev->of_node); if (!IS_ERR(mac_addr)) - memcpy(ndev->dev_addr, mac_addr, ETH_ALEN); + ether_addr_copy(ndev->dev_addr, mac_addr); else eth_hw_addr_random(ndev); diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c index 15b1130aa4ae..0e5de88fd6e8 100644 --- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c +++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c @@ -1504,7 +1504,7 @@ static int octeon_mgmt_probe(struct platform_device *pdev) mac = of_get_mac_address(pdev->dev.of_node); if (!IS_ERR(mac)) - memcpy(netdev->dev_addr, mac, ETH_ALEN); + ether_addr_copy(netdev->dev_addr, mac); else eth_hw_addr_random(netdev); diff --git a/drivers/net/ethernet/davicom/dm9000.c b/drivers/net/ethernet/davicom/dm9000.c index 953ee5616801..5e1aff9a5fd6 100644 --- a/drivers/net/ethernet/davicom/dm9000.c +++ b/drivers/net/ethernet/davicom/dm9000.c @@ -1413,7 +1413,7 @@ static struct dm9000_plat_data *dm9000_parse_dt(struct device *dev) mac_addr = of_get_mac_address(np); if (!IS_ERR(mac_addr)) - memcpy(pdata->dev_addr, mac_addr, sizeof(pdata->dev_addr)); + ether_addr_copy(pdata->dev_addr, mac_addr); return pdata; } diff --git a/drivers/net/ethernet/freescale/fec_mpc52xx.c b/drivers/net/ethernet/freescale/fec_mpc52xx.c index 7b7e526869a7..30cdb246d020 100644 --- a/drivers/net/ethernet/freescale/fec_mpc52xx.c +++ b/drivers/net/ethernet/freescale/fec_mpc52xx.c @@ -903,7 +903,7 @@ static int mpc52xx_fec_probe(struct platform_device *op) */ mac_addr = of_get_mac_address(np); if (!IS_ERR(mac_addr)) { - memcpy(ndev->dev_addr, mac_addr, ETH_ALEN); + ether_addr_copy(ndev->dev_addr, mac_addr); } else { struct mpc52xx_fec __iomem *fec = priv->fec; diff --git a/drivers/net/ethernet/freescale/fman/mac.c b/drivers/net/ethernet/freescale/fman/mac.c index 9cd2c28d17df..7ab8095db192 100644 --- a/drivers/net/ethernet/freescale/fman/mac.c +++ b/drivers/net/ethernet/freescale/fman/mac.c @@ -729,7 +729,7 @@ static int mac_probe(struct platform_device *_of_dev) err = -EINVAL; goto _return_of_get_parent; } - memcpy(mac_dev->addr, mac_addr, sizeof(mac_dev->addr)); + ether_addr_copy(mac_dev->addr, mac_addr); /* Get the port handles */ nph = of_count_phandle_with_args(mac_node, "fsl,fman-ports", NULL); diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c index 90ea7a115d0f..5fad73b2e123 100644 --- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c +++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c @@ -1015,7 +1015,7 @@ static int fs_enet_probe(struct platform_device *ofdev) mac_addr = of_get_mac_address(ofdev->dev.of_node); if (!IS_ERR(mac_addr)) - memcpy(ndev->dev_addr, mac_addr, ETH_ALEN); + ether_addr_copy(ndev->dev_addr, mac_addr); ret = fep->ops->allocate_bd(ndev); if (ret) diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index df13c693b038..e670cd293dba 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -873,7 +873,7 @@ static int gfar_of_init(struct platform_device *ofdev, struct net_device **pdev) mac_addr = of_get_mac_address(np); if (!IS_ERR(mac_addr)) - memcpy(dev->dev_addr, mac_addr, ETH_ALEN); + ether_addr_copy(dev->dev_addr, mac_addr); if (model && !strcasecmp(model, "TSEC")) priv->device_flags |= FSL_GIANFAR_DEV_HAS_GIGABIT | diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c index 216e99af2b5a..4d6892d2f0a4 100644 --- a/drivers/net/ethernet/freescale/ucc_geth.c +++ b/drivers/net/ethernet/freescale/ucc_geth.c @@ -3911,7 +3911,7 @@ static int ucc_geth_probe(struct platform_device* ofdev) mac_addr = of_get_mac_address(np); if (!IS_ERR(mac_addr)) - memcpy(dev->dev_addr, mac_addr, ETH_ALEN); + ether_addr_copy(dev->dev_addr, mac_addr); ugeth->ug_info = ug_info; ugeth->dev = device; diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index b398d6c94dbd..3dcd9c3d8781 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -118,7 +118,7 @@ static int init_sub_crq_irqs(struct ibmvnic_adapter *adapter); static int ibmvnic_init(struct ibmvnic_adapter *); static int ibmvnic_reset_init(struct ibmvnic_adapter *); static void release_crq_queue(struct ibmvnic_adapter *); -static int __ibmvnic_set_mac(struct net_device *netdev, struct sockaddr *p); +static int __ibmvnic_set_mac(struct net_device *, u8 *); static int init_crq_queue(struct ibmvnic_adapter *adapter); static int send_query_phys_parms(struct ibmvnic_adapter *adapter); @@ -849,11 +849,7 @@ static int ibmvnic_login(struct net_device *netdev) } } while (retry); - /* handle pending MAC address changes after successful login */ - if (adapter->mac_change_pending) { - __ibmvnic_set_mac(netdev, &adapter->desired.mac); - adapter->mac_change_pending = false; - } + __ibmvnic_set_mac(netdev, adapter->mac_addr); return 0; } @@ -1115,7 +1111,6 @@ static int ibmvnic_open(struct net_device *netdev) } rc = __ibmvnic_open(netdev); - netif_carrier_on(netdev); return rc; } @@ -1686,28 +1681,40 @@ static void ibmvnic_set_multi(struct net_device *netdev) } } -static int __ibmvnic_set_mac(struct net_device *netdev, struct sockaddr *p) +static int __ibmvnic_set_mac(struct net_device *netdev, u8 *dev_addr) { struct ibmvnic_adapter *adapter = netdev_priv(netdev); - struct sockaddr *addr = p; union ibmvnic_crq crq; int rc; - if (!is_valid_ether_addr(addr->sa_data)) - return -EADDRNOTAVAIL; + if (!is_valid_ether_addr(dev_addr)) { + rc = -EADDRNOTAVAIL; + goto err; + } memset(&crq, 0, sizeof(crq)); crq.change_mac_addr.first = IBMVNIC_CRQ_CMD; crq.change_mac_addr.cmd = CHANGE_MAC_ADDR; - ether_addr_copy(&crq.change_mac_addr.mac_addr[0], addr->sa_data); + ether_addr_copy(&crq.change_mac_addr.mac_addr[0], dev_addr); init_completion(&adapter->fw_done); rc = ibmvnic_send_crq(adapter, &crq); - if (rc) - return rc; + if (rc) { + rc = -EIO; + goto err; + } + wait_for_completion(&adapter->fw_done); /* netdev->dev_addr is changed in handle_change_mac_rsp function */ - return adapter->fw_done_rc ? -EIO : 0; + if (adapter->fw_done_rc) { + rc = -EIO; + goto err; + } + + return 0; +err: + ether_addr_copy(adapter->mac_addr, netdev->dev_addr); + return rc; } static int ibmvnic_set_mac(struct net_device *netdev, void *p) @@ -1716,13 +1723,10 @@ static int ibmvnic_set_mac(struct net_device *netdev, void *p) struct sockaddr *addr = p; int rc; - if (adapter->state == VNIC_PROBED) { - memcpy(&adapter->desired.mac, addr, sizeof(struct sockaddr)); - adapter->mac_change_pending = true; - return 0; - } - - rc = __ibmvnic_set_mac(netdev, addr); + rc = 0; + ether_addr_copy(adapter->mac_addr, addr->sa_data); + if (adapter->state != VNIC_PROBED) + rc = __ibmvnic_set_mac(netdev, addr->sa_data); return rc; } @@ -1859,8 +1863,6 @@ static int do_reset(struct ibmvnic_adapter *adapter, adapter->reset_reason != VNIC_RESET_CHANGE_PARAM) call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, netdev); - netif_carrier_on(netdev); - return 0; } @@ -1930,8 +1932,6 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter, return 0; } - netif_carrier_on(netdev); - return 0; } @@ -3937,8 +3937,8 @@ static int handle_change_mac_rsp(union ibmvnic_crq *crq, dev_err(dev, "Error %ld in CHANGE_MAC_ADDR_RSP\n", rc); goto out; } - memcpy(netdev->dev_addr, &crq->change_mac_addr_rsp.mac_addr[0], - ETH_ALEN); + ether_addr_copy(netdev->dev_addr, + &crq->change_mac_addr_rsp.mac_addr[0]); out: complete(&adapter->fw_done); return rc; @@ -4475,6 +4475,10 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq, crq->link_state_indication.phys_link_state; adapter->logical_link_state = crq->link_state_indication.logical_link_state; + if (adapter->phys_link_state && adapter->logical_link_state) + netif_carrier_on(netdev); + else + netif_carrier_off(netdev); break; case CHANGE_MAC_ADDR_RSP: netdev_dbg(netdev, "Got MAC address change Response\n"); @@ -4852,8 +4856,6 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) init_completion(&adapter->init_done); adapter->resetting = false; - adapter->mac_change_pending = false; - do { rc = init_crq_queue(adapter); if (rc) { diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h index cffdac372a33..dcf2eb6d9290 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.h +++ b/drivers/net/ethernet/ibm/ibmvnic.h @@ -969,7 +969,6 @@ struct ibmvnic_tunables { u64 rx_entries; u64 tx_entries; u64 mtu; - struct sockaddr mac; }; struct ibmvnic_adapter { @@ -1091,7 +1090,6 @@ struct ibmvnic_adapter { bool resetting; bool napi_enabled, from_passive_init; - bool mac_change_pending; bool failover_pending; bool force_reset_recovery; diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c index 07e254fc96ef..409b69fd4374 100644 --- a/drivers/net/ethernet/marvell/mv643xx_eth.c +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c @@ -2750,7 +2750,7 @@ static int mv643xx_eth_shared_of_add_port(struct platform_device *pdev, mac_addr = of_get_mac_address(pnp); if (!IS_ERR(mac_addr)) - memcpy(ppd.mac_addr, mac_addr, ETH_ALEN); + ether_addr_copy(ppd.mac_addr, mac_addr); mv643xx_eth_property(pnp, "tx-queue-size", ppd.tx_queue_size); mv643xx_eth_property(pnp, "tx-sram-addr", ppd.tx_sram_addr); diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 8186135883ed..e758650b2c26 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -4565,7 +4565,7 @@ static int mvneta_probe(struct platform_device *pdev) dt_mac_addr = of_get_mac_address(dn); if (!IS_ERR(dt_mac_addr)) { mac_from = "device tree"; - memcpy(dev->dev_addr, dt_mac_addr, ETH_ALEN); + ether_addr_copy(dev->dev_addr, dt_mac_addr); } else { mvneta_get_mac_addr(pp, hw_mac_addr); if (is_valid_ether_addr(hw_mac_addr)) { diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c index 56d43d9b43ef..d38952eb7aa9 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -5058,8 +5058,10 @@ static int mvpp2_port_probe(struct platform_device *pdev, dev->hw_features |= features | NETIF_F_RXCSUM | NETIF_F_GRO | NETIF_F_HW_VLAN_CTAG_FILTER; - if (mvpp22_rss_is_supported()) + if (mvpp22_rss_is_supported()) { dev->hw_features |= NETIF_F_RXHASH; + dev->features |= NETIF_F_NTUPLE; + } if (port->pool_long->id == MVPP2_BM_JUMBO && port->id != 0) { dev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM); diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c index 9d070cca3e9e..5adf307fbbfd 100644 --- a/drivers/net/ethernet/marvell/sky2.c +++ b/drivers/net/ethernet/marvell/sky2.c @@ -4805,7 +4805,7 @@ static struct net_device *sky2_init_netdev(struct sky2_hw *hw, unsigned port, */ iap = of_get_mac_address(hw->pdev->dev.of_node); if (!IS_ERR(iap)) - memcpy(dev->dev_addr, iap, ETH_ALEN); + ether_addr_copy(dev->dev_addr, iap); else memcpy_fromio(dev->dev_addr, hw->regs + B2_MAC_1 + port * 8, ETH_ALEN); diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index b44172a901ed..ba4fdf1b0dea 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -426,7 +426,7 @@ static void ks8851_init_mac(struct ks8851_net *ks) mac_addr = of_get_mac_address(ks->spidev->dev.of_node); if (!IS_ERR(mac_addr)) { - memcpy(dev->dev_addr, mac_addr, ETH_ALEN); + ether_addr_copy(dev->dev_addr, mac_addr); ks8851_write_mac_addr(dev); return; } diff --git a/drivers/net/ethernet/micrel/ks8851_mll.c b/drivers/net/ethernet/micrel/ks8851_mll.c index dc76b0d15234..e5c8412c08c1 100644 --- a/drivers/net/ethernet/micrel/ks8851_mll.c +++ b/drivers/net/ethernet/micrel/ks8851_mll.c @@ -1328,7 +1328,7 @@ static int ks8851_probe(struct platform_device *pdev) if (pdev->dev.of_node) { mac = of_get_mac_address(pdev->dev.of_node); if (!IS_ERR(mac)) - memcpy(ks->mac_addr, mac, ETH_ALEN); + ether_addr_copy(ks->mac_addr, mac); } else { struct ks8851_mll_platform_data *pdata; diff --git a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c index da138edddd32..fec604c4c0d3 100644 --- a/drivers/net/ethernet/nxp/lpc_eth.c +++ b/drivers/net/ethernet/nxp/lpc_eth.c @@ -1369,7 +1369,7 @@ static int lpc_eth_drv_probe(struct platform_device *pdev) if (!is_valid_ether_addr(ndev->dev_addr)) { const char *macaddr = of_get_mac_address(np); if (!IS_ERR(macaddr)) - memcpy(ndev->dev_addr, macaddr, ETH_ALEN); + ether_addr_copy(ndev->dev_addr, macaddr); } if (!is_valid_ether_addr(ndev->dev_addr)) eth_hw_addr_random(ndev); diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c index 7c4e282242d5..6354f19a31eb 100644 --- a/drivers/net/ethernet/renesas/sh_eth.c +++ b/drivers/net/ethernet/renesas/sh_eth.c @@ -3193,7 +3193,7 @@ static struct sh_eth_plat_data *sh_eth_parse_dt(struct device *dev) mac_addr = of_get_mac_address(np); if (!IS_ERR(mac_addr)) - memcpy(pdata->mac_addr, mac_addr, ETH_ALEN); + ether_addr_copy(pdata->mac_addr, mac_addr); pdata->no_ether_link = of_property_read_bool(np, "renesas,no-ether-link"); diff --git a/drivers/net/ethernet/seeq/sgiseeq.c b/drivers/net/ethernet/seeq/sgiseeq.c index 70cce63a6081..696037d5ac3d 100644 --- a/drivers/net/ethernet/seeq/sgiseeq.c +++ b/drivers/net/ethernet/seeq/sgiseeq.c @@ -735,6 +735,7 @@ static int sgiseeq_probe(struct platform_device *pdev) } platform_set_drvdata(pdev, dev); + SET_NETDEV_DEV(dev, &pdev->dev); sp = netdev_priv(dev); /* Make private data page aligned */ diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c index 195669f550f0..ba124a4da793 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c @@ -1015,6 +1015,8 @@ static struct mac_device_info *sun8i_dwmac_setup(void *ppriv) mac->mac = &sun8i_dwmac_ops; mac->dma = &sun8i_dwmac_dma_ops; + priv->dev->priv_flags |= IFF_UNICAST_FLT; + /* The loopback bit seems to be re-set when link change * Simply mask it each time * Speed 10/100/1000 are set in BIT(2)/BIT(3) diff --git a/drivers/net/ethernet/ti/Makefile b/drivers/net/ethernet/ti/Makefile index c3f53a40b48f..ed12e1e5df2f 100644 --- a/drivers/net/ethernet/ti/Makefile +++ b/drivers/net/ethernet/ti/Makefile @@ -19,4 +19,4 @@ ti_cpsw-y := cpsw.o davinci_cpdma.o cpsw_ale.o cpsw_priv.o cpsw_sl.o cpsw_ethtoo obj-$(CONFIG_TI_KEYSTONE_NETCP) += keystone_netcp.o keystone_netcp-y := netcp_core.o cpsw_ale.o obj-$(CONFIG_TI_KEYSTONE_NETCP_ETHSS) += keystone_netcp_ethss.o -keystone_netcp_ethss-y := netcp_ethss.o netcp_sgmii.o netcp_xgbepcsr.o +keystone_netcp_ethss-y := netcp_ethss.o netcp_sgmii.o netcp_xgbepcsr.o cpsw_ale.o diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index b18eeb05b993..634fc484a0b3 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -2233,7 +2233,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data, no_phy_slave: mac_addr = of_get_mac_address(slave_node); if (!IS_ERR(mac_addr)) { - memcpy(slave_data->mac_addr, mac_addr, ETH_ALEN); + ether_addr_copy(slave_data->mac_addr, mac_addr); } else { ret = ti_cm_get_macid(&pdev->dev, i, slave_data->mac_addr); diff --git a/drivers/net/ethernet/xilinx/ll_temac_main.c b/drivers/net/ethernet/xilinx/ll_temac_main.c index 997475c209c0..47c45152132e 100644 --- a/drivers/net/ethernet/xilinx/ll_temac_main.c +++ b/drivers/net/ethernet/xilinx/ll_temac_main.c @@ -361,7 +361,7 @@ static void temac_do_set_mac_address(struct net_device *ndev) static int temac_init_mac_address(struct net_device *ndev, const void *address) { - memcpy(ndev->dev_addr, address, ETH_ALEN); + ether_addr_copy(ndev->dev_addr, address); if (!is_valid_ether_addr(ndev->dev_addr)) eth_hw_addr_random(ndev); temac_do_set_mac_address(ndev); diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c b/drivers/net/ethernet/xilinx/xilinx_emaclite.c index 691170753563..6886270da695 100644 --- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c +++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c @@ -1167,7 +1167,7 @@ static int xemaclite_of_probe(struct platform_device *ofdev) if (!IS_ERR(mac_address)) { /* Set the MAC address. */ - memcpy(ndev->dev_addr, mac_address, ETH_ALEN); + ether_addr_copy(ndev->dev_addr, mac_address); } else { dev_warn(dev, "No MAC address found, using random\n"); eth_hw_addr_random(ndev); diff --git a/drivers/net/phy/mdio-mux-meson-g12a.c b/drivers/net/phy/mdio-mux-meson-g12a.c index 6fa29ea8e2a3..6644762ff2ab 100644 --- a/drivers/net/phy/mdio-mux-meson-g12a.c +++ b/drivers/net/phy/mdio-mux-meson-g12a.c @@ -33,7 +33,7 @@ #define ETH_PLL_CTL7 0x60 #define ETH_PHY_CNTL0 0x80 -#define EPHY_G12A_ID 0x33000180 +#define EPHY_G12A_ID 0x33010180 #define ETH_PHY_CNTL1 0x84 #define PHY_CNTL1_ST_MODE GENMASK(2, 0) #define PHY_CNTL1_ST_PHYADD GENMASK(7, 3) diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c index 761ce3b1e7bd..a669945eb829 100644 --- a/drivers/net/phy/realtek.c +++ b/drivers/net/phy/realtek.c @@ -217,12 +217,12 @@ static int rtl8211e_config_init(struct phy_device *phydev) if (oldpage < 0) goto err_restore_page; - ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4); + ret = __phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4); if (ret) goto err_restore_page; - ret = phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY, - val); + ret = __phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY, + val); err_restore_page: return phy_restore_page(phydev, oldpage, ret); @@ -275,6 +275,8 @@ static struct phy_driver realtek_drvs[] = { .config_aneg = rtl8211_config_aneg, .read_mmd = &genphy_read_mmd_unsupported, .write_mmd = &genphy_write_mmd_unsupported, + .read_page = rtl821x_read_page, + .write_page = rtl821x_write_page, }, { PHY_ID_MATCH_EXACT(0x001cc912), .name = "RTL8211B Gigabit Ethernet", @@ -284,12 +286,16 @@ static struct phy_driver realtek_drvs[] = { .write_mmd = &genphy_write_mmd_unsupported, .suspend = rtl8211b_suspend, .resume = rtl8211b_resume, + .read_page = rtl821x_read_page, + .write_page = rtl821x_write_page, }, { PHY_ID_MATCH_EXACT(0x001cc913), .name = "RTL8211C Gigabit Ethernet", .config_init = rtl8211c_config_init, .read_mmd = &genphy_read_mmd_unsupported, .write_mmd = &genphy_write_mmd_unsupported, + .read_page = rtl821x_read_page, + .write_page = rtl821x_write_page, }, { PHY_ID_MATCH_EXACT(0x001cc914), .name = "RTL8211DN Gigabit Ethernet", @@ -297,6 +303,8 @@ static struct phy_driver realtek_drvs[] = { .config_intr = rtl8211e_config_intr, .suspend = genphy_suspend, .resume = genphy_resume, + .read_page = rtl821x_read_page, + .write_page = rtl821x_write_page, }, { PHY_ID_MATCH_EXACT(0x001cc915), .name = "RTL8211E Gigabit Ethernet", @@ -305,6 +313,8 @@ static struct phy_driver realtek_drvs[] = { .config_intr = &rtl8211e_config_intr, .suspend = genphy_suspend, .resume = genphy_resume, + .read_page = rtl821x_read_page, + .write_page = rtl821x_write_page, }, { PHY_ID_MATCH_EXACT(0x001cc916), .name = "RTL8211F Gigabit Ethernet", diff --git a/drivers/net/wireless/mediatek/mt76/eeprom.c b/drivers/net/wireless/mediatek/mt76/eeprom.c index 04964937a3af..b7a49ae6b327 100644 --- a/drivers/net/wireless/mediatek/mt76/eeprom.c +++ b/drivers/net/wireless/mediatek/mt76/eeprom.c @@ -95,7 +95,7 @@ mt76_eeprom_override(struct mt76_dev *dev) mac = of_get_mac_address(np); if (!IS_ERR(mac)) - memcpy(dev->macaddr, mac, ETH_ALEN); + ether_addr_copy(dev->macaddr, mac); #endif if (!is_valid_ether_addr(dev->macaddr)) { diff --git a/drivers/of/of_net.c b/drivers/of/of_net.c index 9649cd53e955..6f1be80e8c4e 100644 --- a/drivers/of/of_net.c +++ b/drivers/of/of_net.c @@ -52,39 +52,25 @@ static const void *of_get_mac_addr(struct device_node *np, const char *name) static const void *of_get_mac_addr_nvmem(struct device_node *np) { int ret; - u8 mac[ETH_ALEN]; - struct property *pp; + const void *mac; + u8 nvmem_mac[ETH_ALEN]; struct platform_device *pdev = of_find_device_by_node(np); if (!pdev) return ERR_PTR(-ENODEV); - ret = nvmem_get_mac_address(&pdev->dev, &mac); - if (ret) + ret = nvmem_get_mac_address(&pdev->dev, &nvmem_mac); + if (ret) { + put_device(&pdev->dev); return ERR_PTR(ret); - - pp = devm_kzalloc(&pdev->dev, sizeof(*pp), GFP_KERNEL); - if (!pp) - return ERR_PTR(-ENOMEM); - - pp->name = "nvmem-mac-address"; - pp->length = ETH_ALEN; - pp->value = devm_kmemdup(&pdev->dev, mac, ETH_ALEN, GFP_KERNEL); - if (!pp->value) { - ret = -ENOMEM; - goto free; } - ret = of_add_property(np, pp); - if (ret) - goto free; - - return pp->value; -free: - devm_kfree(&pdev->dev, pp->value); - devm_kfree(&pdev->dev, pp); + mac = devm_kmemdup(&pdev->dev, nvmem_mac, ETH_ALEN, GFP_KERNEL); + put_device(&pdev->dev); + if (!mac) + return ERR_PTR(-ENOMEM); - return ERR_PTR(ret); + return mac; } /** diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index c27c27300d95..e474127dd255 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -451,7 +451,9 @@ failed_out: /** * ext2_alloc_branch - allocate and set up a chain of blocks. * @inode: owner - * @num: depth of the chain (number of blocks to allocate) + * @indirect_blks: depth of the chain (number of blocks to allocate) + * @blks: number of allocated direct blocks + * @goal: preferred place for allocation * @offsets: offsets (in the blocks) to store the pointers to next. * @branch: place to store the chain in. * diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c index 63e599524085..217b290ae3a5 100644 --- a/fs/f2fs/acl.c +++ b/fs/f2fs/acl.c @@ -285,7 +285,7 @@ static int f2fs_acl_create_masq(struct posix_acl *acl, umode_t *mode_p) /* assert(atomic_read(acl->a_refcount) == 1); */ FOREACH_ACL_ENTRY(pa, acl, pe) { - switch(pa->e_tag) { + switch (pa->e_tag) { case ACL_USER_OBJ: pa->e_perm &= (mode >> 6) | ~S_IRWXO; mode &= (pa->e_perm << 6) | ~S_IRWXU; @@ -326,7 +326,7 @@ static int f2fs_acl_create_masq(struct posix_acl *acl, umode_t *mode_p) } *mode_p = (*mode_p & ~S_IRWXUGO) | mode; - return not_equiv; + return not_equiv; } static int f2fs_acl_create(struct inode *dir, umode_t *mode, diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index a98e1b02279e..ed70b68b2b38 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -66,7 +66,7 @@ static struct page *__get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index, .old_blkaddr = index, .new_blkaddr = index, .encrypted_page = NULL, - .is_meta = is_meta, + .is_por = !is_meta, }; int err; @@ -130,6 +130,30 @@ struct page *f2fs_get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index) return __get_meta_page(sbi, index, false); } +static bool __is_bitmap_valid(struct f2fs_sb_info *sbi, block_t blkaddr, + int type) +{ + struct seg_entry *se; + unsigned int segno, offset; + bool exist; + + if (type != DATA_GENERIC_ENHANCE && type != DATA_GENERIC_ENHANCE_READ) + return true; + + segno = GET_SEGNO(sbi, blkaddr); + offset = GET_BLKOFF_FROM_SEG0(sbi, blkaddr); + se = get_seg_entry(sbi, segno); + + exist = f2fs_test_bit(offset, se->cur_valid_map); + if (!exist && type == DATA_GENERIC_ENHANCE) { + f2fs_msg(sbi->sb, KERN_ERR, "Inconsistent error " + "blkaddr:%u, sit bitmap:%d", blkaddr, exist); + set_sbi_flag(sbi, SBI_NEED_FSCK); + WARN_ON(1); + } + return exist; +} + bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type) { @@ -151,15 +175,22 @@ bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, return false; break; case META_POR: + if (unlikely(blkaddr >= MAX_BLKADDR(sbi) || + blkaddr < MAIN_BLKADDR(sbi))) + return false; + break; case DATA_GENERIC: + case DATA_GENERIC_ENHANCE: + case DATA_GENERIC_ENHANCE_READ: if (unlikely(blkaddr >= MAX_BLKADDR(sbi) || - blkaddr < MAIN_BLKADDR(sbi))) { - if (type == DATA_GENERIC) { - f2fs_msg(sbi->sb, KERN_WARNING, - "access invalid blkaddr:%u", blkaddr); - WARN_ON(1); - } + blkaddr < MAIN_BLKADDR(sbi))) { + f2fs_msg(sbi->sb, KERN_WARNING, + "access invalid blkaddr:%u", blkaddr); + set_sbi_flag(sbi, SBI_NEED_FSCK); + WARN_ON(1); return false; + } else { + return __is_bitmap_valid(sbi, blkaddr, type); } break; case META_GENERIC: @@ -189,7 +220,7 @@ int f2fs_ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, .op_flags = sync ? (REQ_META | REQ_PRIO) : REQ_RAHEAD, .encrypted_page = NULL, .in_list = false, - .is_meta = (type != META_POR), + .is_por = (type == META_POR), }; struct blk_plug plug; @@ -644,6 +675,12 @@ int f2fs_recover_orphan_inodes(struct f2fs_sb_info *sbi) if (!is_set_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG)) return 0; + if (bdev_read_only(sbi->sb->s_bdev)) { + f2fs_msg(sbi->sb, KERN_INFO, "write access " + "unavailable, skipping orphan cleanup"); + return 0; + } + if (s_flags & SB_RDONLY) { f2fs_msg(sbi->sb, KERN_INFO, "orphan cleanup on readonly fs"); sbi->sb->s_flags &= ~SB_RDONLY; @@ -758,13 +795,27 @@ static void write_orphan_inodes(struct f2fs_sb_info *sbi, block_t start_blk) } } +static __u32 f2fs_checkpoint_chksum(struct f2fs_sb_info *sbi, + struct f2fs_checkpoint *ckpt) +{ + unsigned int chksum_ofs = le32_to_cpu(ckpt->checksum_offset); + __u32 chksum; + + chksum = f2fs_crc32(sbi, ckpt, chksum_ofs); + if (chksum_ofs < CP_CHKSUM_OFFSET) { + chksum_ofs += sizeof(chksum); + chksum = f2fs_chksum(sbi, chksum, (__u8 *)ckpt + chksum_ofs, + F2FS_BLKSIZE - chksum_ofs); + } + return chksum; +} + static int get_checkpoint_version(struct f2fs_sb_info *sbi, block_t cp_addr, struct f2fs_checkpoint **cp_block, struct page **cp_page, unsigned long long *version) { - unsigned long blk_size = sbi->blocksize; size_t crc_offset = 0; - __u32 crc = 0; + __u32 crc; *cp_page = f2fs_get_meta_page(sbi, cp_addr); if (IS_ERR(*cp_page)) @@ -773,15 +824,27 @@ static int get_checkpoint_version(struct f2fs_sb_info *sbi, block_t cp_addr, *cp_block = (struct f2fs_checkpoint *)page_address(*cp_page); crc_offset = le32_to_cpu((*cp_block)->checksum_offset); - if (crc_offset > (blk_size - sizeof(__le32))) { + if (crc_offset < CP_MIN_CHKSUM_OFFSET || + crc_offset > CP_CHKSUM_OFFSET) { f2fs_put_page(*cp_page, 1); f2fs_msg(sbi->sb, KERN_WARNING, "invalid crc_offset: %zu", crc_offset); return -EINVAL; } - crc = cur_cp_crc(*cp_block); - if (!f2fs_crc_valid(sbi, crc, *cp_block, crc_offset)) { + if (__is_set_ckpt_flags(*cp_block, CP_LARGE_NAT_BITMAP_FLAG)) { + if (crc_offset != CP_MIN_CHKSUM_OFFSET) { + f2fs_put_page(*cp_page, 1); + f2fs_msg(sbi->sb, KERN_WARNING, + "layout of large_nat_bitmap is deprecated, " + "run fsck to repair, chksum_offset: %zu", + crc_offset); + return -EINVAL; + } + } + + crc = f2fs_checkpoint_chksum(sbi, *cp_block); + if (crc != cur_cp_crc(*cp_block)) { f2fs_put_page(*cp_page, 1); f2fs_msg(sbi->sb, KERN_WARNING, "invalid crc value"); return -EINVAL; @@ -1009,13 +1072,11 @@ retry: if (inode) { unsigned long cur_ino = inode->i_ino; - if (is_dir) - F2FS_I(inode)->cp_task = current; + F2FS_I(inode)->cp_task = current; filemap_fdatawrite(inode->i_mapping); - if (is_dir) - F2FS_I(inode)->cp_task = NULL; + F2FS_I(inode)->cp_task = NULL; iput(inode); /* We need to give cpu to another writers. */ @@ -1391,7 +1452,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) get_sit_bitmap(sbi, __bitmap_ptr(sbi, SIT_BITMAP)); get_nat_bitmap(sbi, __bitmap_ptr(sbi, NAT_BITMAP)); - crc32 = f2fs_crc32(sbi, ckpt, le32_to_cpu(ckpt->checksum_offset)); + crc32 = f2fs_checkpoint_chksum(sbi, ckpt); *((__le32 *)((unsigned char *)ckpt + le32_to_cpu(ckpt->checksum_offset))) = cpu_to_le32(crc32); @@ -1475,7 +1536,11 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) clear_sbi_flag(sbi, SBI_IS_DIRTY); clear_sbi_flag(sbi, SBI_NEED_CP); clear_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH); + + spin_lock(&sbi->stat_lock); sbi->unusable_block_count = 0; + spin_unlock(&sbi->stat_lock); + __set_cp_next_pack(sbi); /* @@ -1500,6 +1565,9 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) unsigned long long ckpt_ver; int err = 0; + if (f2fs_readonly(sbi->sb) || f2fs_hw_is_readonly(sbi)) + return -EROFS; + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) { if (cpc->reason != CP_PAUSE) return 0; @@ -1516,10 +1584,6 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) err = -EIO; goto out; } - if (f2fs_readonly(sbi->sb)) { - err = -EROFS; - goto out; - } trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "start block_ops"); diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 64040e998439..eda4181d2092 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -218,12 +218,14 @@ struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi, struct block_device *bdev = sbi->sb->s_bdev; int i; - for (i = 0; i < sbi->s_ndevs; i++) { - if (FDEV(i).start_blk <= blk_addr && - FDEV(i).end_blk >= blk_addr) { - blk_addr -= FDEV(i).start_blk; - bdev = FDEV(i).bdev; - break; + if (f2fs_is_multi_device(sbi)) { + for (i = 0; i < sbi->s_ndevs; i++) { + if (FDEV(i).start_blk <= blk_addr && + FDEV(i).end_blk >= blk_addr) { + blk_addr -= FDEV(i).start_blk; + bdev = FDEV(i).bdev; + break; + } } } if (bio) { @@ -237,6 +239,9 @@ int f2fs_target_device_index(struct f2fs_sb_info *sbi, block_t blkaddr) { int i; + if (!f2fs_is_multi_device(sbi)) + return 0; + for (i = 0; i < sbi->s_ndevs; i++) if (FDEV(i).start_blk <= blkaddr && FDEV(i).end_blk >= blkaddr) return i; @@ -420,7 +425,7 @@ static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, void f2fs_submit_merged_write(struct f2fs_sb_info *sbi, enum page_type type) { - __submit_merged_write_cond(sbi, NULL, 0, 0, type, true); + __submit_merged_write_cond(sbi, NULL, NULL, 0, type, true); } void f2fs_submit_merged_write_cond(struct f2fs_sb_info *sbi, @@ -448,7 +453,8 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio) fio->encrypted_page : fio->page; if (!f2fs_is_valid_blkaddr(fio->sbi, fio->new_blkaddr, - __is_meta_io(fio) ? META_GENERIC : DATA_GENERIC)) + fio->is_por ? META_POR : (__is_meta_io(fio) ? + META_GENERIC : DATA_GENERIC_ENHANCE))) return -EFAULT; trace_f2fs_submit_page_bio(page, fio); @@ -498,9 +504,7 @@ next: spin_unlock(&io->io_lock); } - if (__is_valid_data_blkaddr(fio->old_blkaddr)) - verify_block_addr(fio, fio->old_blkaddr); - verify_block_addr(fio, fio->new_blkaddr); + verify_fio_blkaddr(fio); bio_page = fio->encrypted_page ? fio->encrypted_page : fio->page; @@ -557,9 +561,6 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr, struct bio_post_read_ctx *ctx; unsigned int post_read_steps = 0; - if (!f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC)) - return ERR_PTR(-EFAULT); - bio = f2fs_bio_alloc(sbi, min_t(int, nr_pages, BIO_MAX_PAGES), false); if (!bio) return ERR_PTR(-ENOMEM); @@ -587,8 +588,10 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr, static int f2fs_submit_page_read(struct inode *inode, struct page *page, block_t blkaddr) { - struct bio *bio = f2fs_grab_read_bio(inode, blkaddr, 1, 0); + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct bio *bio; + bio = f2fs_grab_read_bio(inode, blkaddr, 1, 0); if (IS_ERR(bio)) return PTR_ERR(bio); @@ -600,8 +603,8 @@ static int f2fs_submit_page_read(struct inode *inode, struct page *page, return -EFAULT; } ClearPageError(page); - inc_page_count(F2FS_I_SB(inode), F2FS_RD_DATA); - __submit_bio(F2FS_I_SB(inode), bio, DATA); + inc_page_count(sbi, F2FS_RD_DATA); + __submit_bio(sbi, bio, DATA); return 0; } @@ -729,6 +732,11 @@ struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index, if (f2fs_lookup_extent_cache(inode, index, &ei)) { dn.data_blkaddr = ei.blk + index - ei.fofs; + if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode), dn.data_blkaddr, + DATA_GENERIC_ENHANCE_READ)) { + err = -EFAULT; + goto put_err; + } goto got_it; } @@ -742,6 +750,13 @@ struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index, err = -ENOENT; goto put_err; } + if (dn.data_blkaddr != NEW_ADDR && + !f2fs_is_valid_blkaddr(F2FS_I_SB(inode), + dn.data_blkaddr, + DATA_GENERIC_ENHANCE)) { + err = -EFAULT; + goto put_err; + } got_it: if (PageUptodate(page)) { unlock_page(page); @@ -1084,12 +1099,12 @@ next_block: blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node); if (__is_valid_data_blkaddr(blkaddr) && - !f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC)) { + !f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC_ENHANCE)) { err = -EFAULT; goto sync_out; } - if (is_valid_data_blkaddr(sbi, blkaddr)) { + if (__is_valid_data_blkaddr(blkaddr)) { /* use out-place-update for driect IO under LFS mode */ if (test_opt(sbi, LFS) && flag == F2FS_GET_BLOCK_DIO && map->m_may_create) { @@ -1499,6 +1514,118 @@ out: return ret; } +static int f2fs_read_single_page(struct inode *inode, struct page *page, + unsigned nr_pages, + struct f2fs_map_blocks *map, + struct bio **bio_ret, + sector_t *last_block_in_bio, + bool is_readahead) +{ + struct bio *bio = *bio_ret; + const unsigned blkbits = inode->i_blkbits; + const unsigned blocksize = 1 << blkbits; + sector_t block_in_file; + sector_t last_block; + sector_t last_block_in_file; + sector_t block_nr; + int ret = 0; + + block_in_file = (sector_t)page->index; + last_block = block_in_file + nr_pages; + last_block_in_file = (i_size_read(inode) + blocksize - 1) >> + blkbits; + if (last_block > last_block_in_file) + last_block = last_block_in_file; + + /* just zeroing out page which is beyond EOF */ + if (block_in_file >= last_block) + goto zero_out; + /* + * Map blocks using the previous result first. + */ + if ((map->m_flags & F2FS_MAP_MAPPED) && + block_in_file > map->m_lblk && + block_in_file < (map->m_lblk + map->m_len)) + goto got_it; + + /* + * Then do more f2fs_map_blocks() calls until we are + * done with this page. + */ + map->m_lblk = block_in_file; + map->m_len = last_block - block_in_file; + + ret = f2fs_map_blocks(inode, map, 0, F2FS_GET_BLOCK_DEFAULT); + if (ret) + goto out; +got_it: + if ((map->m_flags & F2FS_MAP_MAPPED)) { + block_nr = map->m_pblk + block_in_file - map->m_lblk; + SetPageMappedToDisk(page); + + if (!PageUptodate(page) && !cleancache_get_page(page)) { + SetPageUptodate(page); + goto confused; + } + + if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode), block_nr, + DATA_GENERIC_ENHANCE_READ)) { + ret = -EFAULT; + goto out; + } + } else { +zero_out: + zero_user_segment(page, 0, PAGE_SIZE); + if (!PageUptodate(page)) + SetPageUptodate(page); + unlock_page(page); + goto out; + } + + /* + * This page will go to BIO. Do we need to send this + * BIO off first? + */ + if (bio && (*last_block_in_bio != block_nr - 1 || + !__same_bdev(F2FS_I_SB(inode), block_nr, bio))) { +submit_and_realloc: + __submit_bio(F2FS_I_SB(inode), bio, DATA); + bio = NULL; + } + if (bio == NULL) { + bio = f2fs_grab_read_bio(inode, block_nr, nr_pages, + is_readahead ? REQ_RAHEAD : 0); + if (IS_ERR(bio)) { + ret = PTR_ERR(bio); + bio = NULL; + goto out; + } + } + + /* + * If the page is under writeback, we need to wait for + * its completion to see the correct decrypted data. + */ + f2fs_wait_on_block_writeback(inode, block_nr); + + if (bio_add_page(bio, page, blocksize, 0) < blocksize) + goto submit_and_realloc; + + inc_page_count(F2FS_I_SB(inode), F2FS_RD_DATA); + ClearPageError(page); + *last_block_in_bio = block_nr; + goto out; +confused: + if (bio) { + __submit_bio(F2FS_I_SB(inode), bio, DATA); + bio = NULL; + } + unlock_page(page); +out: + *bio_ret = bio; + return ret; +} + /* * This function was originally taken from fs/mpage.c, and customized for f2fs. * Major change was from block_size == page_size in f2fs by default. @@ -1515,13 +1642,8 @@ static int f2fs_mpage_readpages(struct address_space *mapping, struct bio *bio = NULL; sector_t last_block_in_bio = 0; struct inode *inode = mapping->host; - const unsigned blkbits = inode->i_blkbits; - const unsigned blocksize = 1 << blkbits; - sector_t block_in_file; - sector_t last_block; - sector_t last_block_in_file; - sector_t block_nr; struct f2fs_map_blocks map; + int ret = 0; map.m_pblk = 0; map.m_lblk = 0; @@ -1544,98 +1666,13 @@ static int f2fs_mpage_readpages(struct address_space *mapping, goto next_page; } - block_in_file = (sector_t)page->index; - last_block = block_in_file + nr_pages; - last_block_in_file = (i_size_read(inode) + blocksize - 1) >> - blkbits; - if (last_block > last_block_in_file) - last_block = last_block_in_file; - - /* just zeroing out page which is beyond EOF */ - if (block_in_file >= last_block) - goto zero_out; - /* - * Map blocks using the previous result first. - */ - if ((map.m_flags & F2FS_MAP_MAPPED) && - block_in_file > map.m_lblk && - block_in_file < (map.m_lblk + map.m_len)) - goto got_it; - - /* - * Then do more f2fs_map_blocks() calls until we are - * done with this page. - */ - map.m_lblk = block_in_file; - map.m_len = last_block - block_in_file; - - if (f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_DEFAULT)) - goto set_error_page; -got_it: - if ((map.m_flags & F2FS_MAP_MAPPED)) { - block_nr = map.m_pblk + block_in_file - map.m_lblk; - SetPageMappedToDisk(page); - - if (!PageUptodate(page) && !cleancache_get_page(page)) { - SetPageUptodate(page); - goto confused; - } - - if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode), block_nr, - DATA_GENERIC)) - goto set_error_page; - } else { -zero_out: + ret = f2fs_read_single_page(inode, page, nr_pages, &map, &bio, + &last_block_in_bio, is_readahead); + if (ret) { + SetPageError(page); zero_user_segment(page, 0, PAGE_SIZE); - if (!PageUptodate(page)) - SetPageUptodate(page); unlock_page(page); - goto next_page; } - - /* - * This page will go to BIO. Do we need to send this - * BIO off first? - */ - if (bio && (last_block_in_bio != block_nr - 1 || - !__same_bdev(F2FS_I_SB(inode), block_nr, bio))) { -submit_and_realloc: - __submit_bio(F2FS_I_SB(inode), bio, DATA); - bio = NULL; - } - if (bio == NULL) { - bio = f2fs_grab_read_bio(inode, block_nr, nr_pages, - is_readahead ? REQ_RAHEAD : 0); - if (IS_ERR(bio)) { - bio = NULL; - goto set_error_page; - } - } - - /* - * If the page is under writeback, we need to wait for - * its completion to see the correct decrypted data. - */ - f2fs_wait_on_block_writeback(inode, block_nr); - - if (bio_add_page(bio, page, blocksize, 0) < blocksize) - goto submit_and_realloc; - - inc_page_count(F2FS_I_SB(inode), F2FS_RD_DATA); - ClearPageError(page); - last_block_in_bio = block_nr; - goto next_page; -set_error_page: - SetPageError(page); - zero_user_segment(page, 0, PAGE_SIZE); - unlock_page(page); - goto next_page; -confused: - if (bio) { - __submit_bio(F2FS_I_SB(inode), bio, DATA); - bio = NULL; - } - unlock_page(page); next_page: if (pages) put_page(page); @@ -1643,7 +1680,7 @@ next_page: BUG_ON(pages && !list_empty(pages)); if (bio) __submit_bio(F2FS_I_SB(inode), bio, DATA); - return 0; + return pages ? 0 : ret; } static int f2fs_read_data_page(struct file *file, struct page *page) @@ -1813,7 +1850,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio) fio->old_blkaddr = ei.blk + page->index - ei.fofs; if (!f2fs_is_valid_blkaddr(fio->sbi, fio->old_blkaddr, - DATA_GENERIC)) + DATA_GENERIC_ENHANCE)) return -EFAULT; ipu_force = true; @@ -1840,7 +1877,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio) got_it: if (__is_valid_data_blkaddr(fio->old_blkaddr) && !f2fs_is_valid_blkaddr(fio->sbi, fio->old_blkaddr, - DATA_GENERIC)) { + DATA_GENERIC_ENHANCE)) { err = -EFAULT; goto out_writepage; } @@ -1848,7 +1885,8 @@ got_it: * If current allocation needs SSR, * it had better in-place writes for updated data. */ - if (ipu_force || (is_valid_data_blkaddr(fio->sbi, fio->old_blkaddr) && + if (ipu_force || + (__is_valid_data_blkaddr(fio->old_blkaddr) && need_inplace_update(fio))) { err = encrypt_one_page(fio); if (err) @@ -1866,9 +1904,10 @@ got_it: true); if (PageWriteback(page)) end_page_writeback(page); + } else { + set_inode_flag(inode, FI_UPDATE_WRITE); } trace_f2fs_do_write_data_page(fio->page, IPU); - set_inode_flag(inode, FI_UPDATE_WRITE); return err; } @@ -2030,7 +2069,8 @@ out: } unlock_page(page); - if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) + if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode) && + !F2FS_I(inode)->cp_task) f2fs_balance_fs(sbi, need_balance_fs); if (unlikely(f2fs_cp_error(sbi))) { @@ -2491,6 +2531,11 @@ repeat: zero_user_segment(page, 0, PAGE_SIZE); SetPageUptodate(page); } else { + if (!f2fs_is_valid_blkaddr(sbi, blkaddr, + DATA_GENERIC_ENHANCE_READ)) { + err = -EFAULT; + goto fail; + } err = f2fs_submit_page_read(inode, page, blkaddr); if (err) goto fail; diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index bacf5c2a8850..06b89a9862ab 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -210,7 +210,14 @@ enum { META_SSA, META_MAX, META_POR, - DATA_GENERIC, + DATA_GENERIC, /* check range only */ + DATA_GENERIC_ENHANCE, /* strong check on range and segment bitmap */ + DATA_GENERIC_ENHANCE_READ, /* + * strong check on range and segment + * bitmap but no warning due to race + * condition of read on truncated area + * by extent_cache + */ META_GENERIC, }; @@ -1041,7 +1048,7 @@ struct f2fs_io_info { bool submitted; /* indicate IO submission */ int need_lock; /* indicate we need to lock cp_rwsem */ bool in_list; /* indicate fio is in io_list */ - bool is_meta; /* indicate borrow meta inode mapping or not */ + bool is_por; /* indicate IO is from recovery or not */ bool retry; /* need to reallocate block address */ enum iostat_type io_type; /* io type */ struct writeback_control *io_wbc; /* writeback control */ @@ -1068,8 +1075,8 @@ struct f2fs_dev_info { block_t start_blk; block_t end_blk; #ifdef CONFIG_BLK_DEV_ZONED - unsigned int nr_blkz; /* Total number of zones */ - u8 *blkz_type; /* Array of zones type */ + unsigned int nr_blkz; /* Total number of zones */ + unsigned long *blkz_seq; /* Bitmap indicating sequential zones */ #endif }; @@ -1366,6 +1373,17 @@ static inline bool time_to_inject(struct f2fs_sb_info *sbi, int type) } #endif +/* + * Test if the mounted volume is a multi-device volume. + * - For a single regular disk volume, sbi->s_ndevs is 0. + * - For a single zoned disk volume, sbi->s_ndevs is 1. + * - For a multi-device volume, sbi->s_ndevs is always 2 or more. + */ +static inline bool f2fs_is_multi_device(struct f2fs_sb_info *sbi) +{ + return sbi->s_ndevs > 1; +} + /* For write statistics. Suppose sector size is 512 bytes, * and the return value is in kbytes. s is of struct f2fs_sb_info. */ @@ -1777,6 +1795,7 @@ enospc: return -ENOSPC; } +void f2fs_msg(struct super_block *sb, const char *level, const char *fmt, ...); static inline void dec_valid_block_count(struct f2fs_sb_info *sbi, struct inode *inode, block_t count) @@ -1785,13 +1804,21 @@ static inline void dec_valid_block_count(struct f2fs_sb_info *sbi, spin_lock(&sbi->stat_lock); f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count); - f2fs_bug_on(sbi, inode->i_blocks < sectors); sbi->total_valid_block_count -= (block_t)count; if (sbi->reserved_blocks && sbi->current_reserved_blocks < sbi->reserved_blocks) sbi->current_reserved_blocks = min(sbi->reserved_blocks, sbi->current_reserved_blocks + count); spin_unlock(&sbi->stat_lock); + if (unlikely(inode->i_blocks < sectors)) { + f2fs_msg(sbi->sb, KERN_WARNING, + "Inconsistent i_blocks, ino:%lu, iblocks:%llu, sectors:%llu", + inode->i_ino, + (unsigned long long)inode->i_blocks, + (unsigned long long)sectors); + set_sbi_flag(sbi, SBI_NEED_FSCK); + return; + } f2fs_i_blocks_write(inode, count, false, true); } @@ -1889,7 +1916,11 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag) if (is_set_ckpt_flags(sbi, CP_LARGE_NAT_BITMAP_FLAG)) { offset = (flag == SIT_BITMAP) ? le32_to_cpu(ckpt->nat_ver_bitmap_bytesize) : 0; - return &ckpt->sit_nat_version_bitmap + offset; + /* + * if large_nat_bitmap feature is enabled, leave checksum + * protection for all nat/sit bitmaps. + */ + return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32); } if (__cp_payload(sbi) > 0) { @@ -2008,7 +2039,6 @@ static inline void dec_valid_node_count(struct f2fs_sb_info *sbi, f2fs_bug_on(sbi, !sbi->total_valid_block_count); f2fs_bug_on(sbi, !sbi->total_valid_node_count); - f2fs_bug_on(sbi, !is_inode && !inode->i_blocks); sbi->total_valid_node_count--; sbi->total_valid_block_count--; @@ -2018,10 +2048,19 @@ static inline void dec_valid_node_count(struct f2fs_sb_info *sbi, spin_unlock(&sbi->stat_lock); - if (is_inode) + if (is_inode) { dquot_free_inode(inode); - else + } else { + if (unlikely(inode->i_blocks == 0)) { + f2fs_msg(sbi->sb, KERN_WARNING, + "Inconsistent i_blocks, ino:%lu, iblocks:%llu", + inode->i_ino, + (unsigned long long)inode->i_blocks); + set_sbi_flag(sbi, SBI_NEED_FSCK); + return; + } f2fs_i_blocks_write(inode, 1, false, true); + } } static inline unsigned int valid_node_count(struct f2fs_sb_info *sbi) @@ -2545,7 +2584,14 @@ static inline int f2fs_has_inline_xattr(struct inode *inode) static inline unsigned int addrs_per_inode(struct inode *inode) { - return CUR_ADDRS_PER_INODE(inode) - get_inline_xattr_addrs(inode); + unsigned int addrs = CUR_ADDRS_PER_INODE(inode) - + get_inline_xattr_addrs(inode); + return ALIGN_DOWN(addrs, 1); +} + +static inline unsigned int addrs_per_block(struct inode *inode) +{ + return ALIGN_DOWN(DEF_ADDRS_PER_BLOCK, 1); } static inline void *inline_xattr_addr(struct inode *inode, struct page *page) @@ -2558,7 +2604,9 @@ static inline void *inline_xattr_addr(struct inode *inode, struct page *page) static inline int inline_xattr_size(struct inode *inode) { - return get_inline_xattr_addrs(inode) * sizeof(__le32); + if (f2fs_has_inline_xattr(inode)) + return get_inline_xattr_addrs(inode) * sizeof(__le32); + return 0; } static inline int f2fs_has_inline_data(struct inode *inode) @@ -2800,12 +2848,10 @@ static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi, #define __is_large_section(sbi) ((sbi)->segs_per_sec > 1) -#define __is_meta_io(fio) (PAGE_TYPE_OF_BIO((fio)->type) == META && \ - (!is_read_io((fio)->op) || (fio)->is_meta)) +#define __is_meta_io(fio) (PAGE_TYPE_OF_BIO((fio)->type) == META) bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type); -void f2fs_msg(struct super_block *sb, const char *level, const char *fmt, ...); static inline void verify_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type) { @@ -2824,15 +2870,6 @@ static inline bool __is_valid_data_blkaddr(block_t blkaddr) return true; } -static inline bool is_valid_data_blkaddr(struct f2fs_sb_info *sbi, - block_t blkaddr) -{ - if (!__is_valid_data_blkaddr(blkaddr)) - return false; - verify_blkaddr(sbi, blkaddr, DATA_GENERIC); - return true; -} - static inline void f2fs_set_page_private(struct page *page, unsigned long data) { @@ -3530,16 +3567,12 @@ F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND); F2FS_FEATURE_FUNCS(sb_chksum, SB_CHKSUM); #ifdef CONFIG_BLK_DEV_ZONED -static inline int get_blkz_type(struct f2fs_sb_info *sbi, - struct block_device *bdev, block_t blkaddr) +static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi, int devi, + block_t blkaddr) { unsigned int zno = blkaddr >> sbi->log_blocks_per_blkz; - int i; - for (i = 0; i < sbi->s_ndevs; i++) - if (FDEV(i).bdev == bdev) - return FDEV(i).blkz_type[zno]; - return -EINVAL; + return test_bit(zno, FDEV(devi).blkz_seq); } #endif @@ -3548,9 +3581,23 @@ static inline bool f2fs_hw_should_discard(struct f2fs_sb_info *sbi) return f2fs_sb_has_blkzoned(sbi); } +static inline bool f2fs_bdev_support_discard(struct block_device *bdev) +{ + return blk_queue_discard(bdev_get_queue(bdev)) || + bdev_is_zoned(bdev); +} + static inline bool f2fs_hw_support_discard(struct f2fs_sb_info *sbi) { - return blk_queue_discard(bdev_get_queue(sbi->sb->s_bdev)); + int i; + + if (!f2fs_is_multi_device(sbi)) + return f2fs_bdev_support_discard(sbi->sb->s_bdev); + + for (i = 0; i < sbi->s_ndevs; i++) + if (f2fs_bdev_support_discard(FDEV(i).bdev)) + return true; + return false; } static inline bool f2fs_realtime_discard_enable(struct f2fs_sb_info *sbi) @@ -3559,6 +3606,20 @@ static inline bool f2fs_realtime_discard_enable(struct f2fs_sb_info *sbi) f2fs_hw_should_discard(sbi); } +static inline bool f2fs_hw_is_readonly(struct f2fs_sb_info *sbi) +{ + int i; + + if (!f2fs_is_multi_device(sbi)) + return bdev_read_only(sbi->sb->s_bdev); + + for (i = 0; i < sbi->s_ndevs; i++) + if (bdev_read_only(FDEV(i).bdev)) + return true; + return false; +} + + static inline void set_opt_mode(struct f2fs_sb_info *sbi, unsigned int mt) { clear_opt(sbi, ADAPTIVE); @@ -3614,7 +3675,7 @@ static inline bool f2fs_force_buffered_io(struct inode *inode, if (f2fs_post_read_required(inode)) return true; - if (sbi->s_ndevs) + if (f2fs_is_multi_device(sbi)) return true; /* * for blkzoned device, fallback direct IO to buffered IO, so @@ -3651,4 +3712,4 @@ static inline bool is_journalled_quota(struct f2fs_sb_info *sbi) return false; } -#endif +#endif /* _LINUX_F2FS_H */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 5742ab8b57dc..45b45f37d347 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -39,6 +39,8 @@ static vm_fault_t f2fs_filemap_fault(struct vm_fault *vmf) ret = filemap_fault(vmf); up_read(&F2FS_I(inode)->i_mmap_sem); + trace_f2fs_filemap_fault(inode, vmf->pgoff, (unsigned long)ret); + return ret; } @@ -356,7 +358,7 @@ static bool __found_offset(struct f2fs_sb_info *sbi, block_t blkaddr, switch (whence) { case SEEK_DATA: if ((blkaddr == NEW_ADDR && dirty == pgofs) || - is_valid_data_blkaddr(sbi, blkaddr)) + __is_valid_data_blkaddr(blkaddr)) return true; break; case SEEK_HOLE: @@ -422,7 +424,7 @@ static loff_t f2fs_seek_block(struct file *file, loff_t offset, int whence) if (__is_valid_data_blkaddr(blkaddr) && !f2fs_is_valid_blkaddr(F2FS_I_SB(inode), - blkaddr, DATA_GENERIC)) { + blkaddr, DATA_GENERIC_ENHANCE)) { f2fs_put_dnode(&dn); goto fail; } @@ -523,7 +525,8 @@ void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count) f2fs_set_data_blkaddr(dn); if (__is_valid_data_blkaddr(blkaddr) && - !f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC)) + !f2fs_is_valid_blkaddr(sbi, blkaddr, + DATA_GENERIC_ENHANCE)) continue; f2fs_invalidate_blocks(sbi, blkaddr); @@ -552,7 +555,7 @@ void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count) void f2fs_truncate_data_blocks(struct dnode_of_data *dn) { - f2fs_truncate_data_blocks_range(dn, ADDRS_PER_BLOCK); + f2fs_truncate_data_blocks_range(dn, ADDRS_PER_BLOCK(dn->inode)); } static int truncate_partial_data_page(struct inode *inode, u64 from, @@ -1006,7 +1009,8 @@ next_dnode: } else if (ret == -ENOENT) { if (dn.max_level == 0) return -ENOENT; - done = min((pgoff_t)ADDRS_PER_BLOCK - dn.ofs_in_node, len); + done = min((pgoff_t)ADDRS_PER_BLOCK(inode) - dn.ofs_in_node, + len); blkaddr += done; do_replace += done; goto next; @@ -1017,6 +1021,14 @@ next_dnode: for (i = 0; i < done; i++, blkaddr++, do_replace++, dn.ofs_in_node++) { *blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node); + + if (__is_valid_data_blkaddr(*blkaddr) && + !f2fs_is_valid_blkaddr(sbi, *blkaddr, + DATA_GENERIC_ENHANCE)) { + f2fs_put_dnode(&dn); + return -EFAULT; + } + if (!f2fs_is_checkpointed_data(sbi, *blkaddr)) { if (test_opt(sbi, LFS)) { @@ -1157,7 +1169,7 @@ static int __exchange_data_block(struct inode *src_inode, int ret; while (len) { - olen = min((pgoff_t)4 * ADDRS_PER_BLOCK, len); + olen = min((pgoff_t)4 * ADDRS_PER_BLOCK(src_inode), len); src_blkaddr = f2fs_kvzalloc(F2FS_I_SB(src_inode), array_size(olen, sizeof(block_t)), @@ -2573,10 +2585,10 @@ static int f2fs_ioc_flush_device(struct file *filp, unsigned long arg) sizeof(range))) return -EFAULT; - if (sbi->s_ndevs <= 1 || sbi->s_ndevs - 1 <= range.dev_num || + if (!f2fs_is_multi_device(sbi) || sbi->s_ndevs - 1 <= range.dev_num || __is_large_section(sbi)) { f2fs_msg(sbi->sb, KERN_WARNING, - "Can't flush %u in %d for segs_per_sec %u != 1\n", + "Can't flush %u in %d for segs_per_sec %u != 1", range.dev_num, sbi->s_ndevs, sbi->segs_per_sec); return -EINVAL; @@ -2858,7 +2870,7 @@ int f2fs_pin_file_control(struct inode *inode, bool inc) if (fi->i_gc_failures[GC_FAILURE_PIN] > sbi->gc_pin_file_threshold) { f2fs_msg(sbi->sb, KERN_WARNING, - "%s: Enable GC = ino %lx after %x GC trials\n", + "%s: Enable GC = ino %lx after %x GC trials", __func__, inode->i_ino, fi->i_gc_failures[GC_FAILURE_PIN]); clear_inode_flag(inode, FI_PIN_FILE); @@ -3035,15 +3047,21 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) struct inode *inode = file_inode(file); ssize_t ret; - if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) - return -EIO; + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) { + ret = -EIO; + goto out; + } - if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT)) - return -EINVAL; + if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT)) { + ret = -EINVAL; + goto out; + } if (!inode_trylock(inode)) { - if (iocb->ki_flags & IOCB_NOWAIT) - return -EAGAIN; + if (iocb->ki_flags & IOCB_NOWAIT) { + ret = -EAGAIN; + goto out; + } inode_lock(inode); } @@ -3056,19 +3074,16 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (iov_iter_fault_in_readable(from, iov_iter_count(from))) set_inode_flag(inode, FI_NO_PREALLOC); - if ((iocb->ki_flags & IOCB_NOWAIT) && - (iocb->ki_flags & IOCB_DIRECT)) { - if (!f2fs_overwrite_io(inode, iocb->ki_pos, + if ((iocb->ki_flags & IOCB_NOWAIT)) { + if (!f2fs_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from)) || - f2fs_has_inline_data(inode) || - f2fs_force_buffered_io(inode, - iocb, from)) { - clear_inode_flag(inode, - FI_NO_PREALLOC); - inode_unlock(inode); - return -EAGAIN; - } - + f2fs_has_inline_data(inode) || + f2fs_force_buffered_io(inode, iocb, from)) { + clear_inode_flag(inode, FI_NO_PREALLOC); + inode_unlock(inode); + ret = -EAGAIN; + goto out; + } } else { preallocated = true; target_size = iocb->ki_pos + iov_iter_count(from); @@ -3077,7 +3092,8 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (err) { clear_inode_flag(inode, FI_NO_PREALLOC); inode_unlock(inode); - return err; + ret = err; + goto out; } } ret = __generic_file_write_iter(iocb, from); @@ -3091,7 +3107,9 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret); } inode_unlock(inode); - +out: + trace_f2fs_file_write_iter(inode, iocb->ki_pos, + iov_iter_count(from), ret); if (ret > 0) ret = generic_write_sync(iocb, ret); return ret; diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 195cf0f9d9ef..963fb4571fd9 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -591,7 +591,7 @@ block_t f2fs_start_bidx_of_node(unsigned int node_ofs, struct inode *inode) int dec = (node_ofs - indirect_blks - 3) / (NIDS_PER_BLOCK + 1); bidx = node_ofs - 5 - dec; } - return bidx * ADDRS_PER_BLOCK + ADDRS_PER_INODE(inode); + return bidx * ADDRS_PER_BLOCK(inode) + ADDRS_PER_INODE(inode); } static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, @@ -656,6 +656,11 @@ static int ra_data_block(struct inode *inode, pgoff_t index) if (f2fs_lookup_extent_cache(inode, index, &ei)) { dn.data_blkaddr = ei.blk + index - ei.fofs; + if (unlikely(!f2fs_is_valid_blkaddr(sbi, dn.data_blkaddr, + DATA_GENERIC_ENHANCE_READ))) { + err = -EFAULT; + goto put_page; + } goto got_it; } @@ -665,8 +670,12 @@ static int ra_data_block(struct inode *inode, pgoff_t index) goto put_page; f2fs_put_dnode(&dn); + if (!__is_valid_data_blkaddr(dn.data_blkaddr)) { + err = -ENOENT; + goto put_page; + } if (unlikely(!f2fs_is_valid_blkaddr(sbi, dn.data_blkaddr, - DATA_GENERIC))) { + DATA_GENERIC_ENHANCE))) { err = -EFAULT; goto put_page; } @@ -1175,6 +1184,7 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi, "type [%d, %d] in SSA and SIT", segno, type, GET_SUM_TYPE((&sum->footer))); set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_stop_checkpoint(sbi, false); goto skip; } @@ -1346,7 +1356,7 @@ void f2fs_build_gc_manager(struct f2fs_sb_info *sbi) sbi->gc_pin_file_threshold = DEF_GC_FAILED_PINNED_FILES; /* give warm/cold data area from slower device */ - if (sbi->s_ndevs && !__is_large_section(sbi)) + if (f2fs_is_multi_device(sbi) && !__is_large_section(sbi)) SIT_I(sbi)->last_victim[ALLOC_NEXT] = GET_SEGNO(sbi, FDEV(0).end_blk) + 1; } diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index bb6a152310ef..404d2462a0fe 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -420,6 +420,14 @@ static int f2fs_move_inline_dirents(struct inode *dir, struct page *ipage, stat_dec_inline_dir(dir); clear_inode_flag(dir, FI_INLINE_DENTRY); + /* + * should retrieve reserved space which was used to keep + * inline_dentry's structure for backward compatibility. + */ + if (!f2fs_sb_has_flexible_inline_xattr(F2FS_I_SB(dir)) && + !f2fs_has_inline_xattr(dir)) + F2FS_I(dir)->i_inline_xattr_size = 0; + f2fs_i_depth_write(dir, 1); if (i_size_read(dir) < PAGE_SIZE) f2fs_i_size_write(dir, PAGE_SIZE); @@ -501,6 +509,15 @@ static int f2fs_move_rehashed_dirents(struct inode *dir, struct page *ipage, stat_dec_inline_dir(dir); clear_inode_flag(dir, FI_INLINE_DENTRY); + + /* + * should retrieve reserved space which was used to keep + * inline_dentry's structure for backward compatibility. + */ + if (!f2fs_sb_has_flexible_inline_xattr(F2FS_I_SB(dir)) && + !f2fs_has_inline_xattr(dir)) + F2FS_I(dir)->i_inline_xattr_size = 0; + kvfree(backup_dentry); return 0; recover: diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index e7f2e8759315..ccb02226dd2c 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -73,7 +73,7 @@ static int __written_first_block(struct f2fs_sb_info *sbi, if (!__is_valid_data_blkaddr(addr)) return 1; - if (!f2fs_is_valid_blkaddr(sbi, addr, DATA_GENERIC)) + if (!f2fs_is_valid_blkaddr(sbi, addr, DATA_GENERIC_ENHANCE)) return -EFAULT; return 0; } @@ -177,8 +177,8 @@ bool f2fs_inode_chksum_verify(struct f2fs_sb_info *sbi, struct page *page) if (provided != calculated) f2fs_msg(sbi->sb, KERN_WARNING, - "checksum invalid, ino = %x, %x vs. %x", - ino_of_node(page), provided, calculated); + "checksum invalid, nid = %lu, ino_of_node = %x, %x vs. %x", + page->index, ino_of_node(page), provided, calculated); return provided == calculated; } @@ -267,9 +267,10 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page) struct extent_info *ei = &F2FS_I(inode)->extent_tree->largest; if (ei->len && - (!f2fs_is_valid_blkaddr(sbi, ei->blk, DATA_GENERIC) || + (!f2fs_is_valid_blkaddr(sbi, ei->blk, + DATA_GENERIC_ENHANCE) || !f2fs_is_valid_blkaddr(sbi, ei->blk + ei->len - 1, - DATA_GENERIC))) { + DATA_GENERIC_ENHANCE))) { set_sbi_flag(sbi, SBI_NEED_FSCK); f2fs_msg(sbi->sb, KERN_WARNING, "%s: inode (ino=%lx) extent info [%u, %u, %u] " @@ -488,6 +489,7 @@ make_now: return inode; bad_inode: + f2fs_inode_synced(inode); iget_failed(inode); trace_f2fs_iget_exit(inode, ret); return ERR_PTR(ret); diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index c3e8a901d47a..0f77f9242751 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -143,7 +143,7 @@ fail_drop: return ERR_PTR(err); } -static int is_extension_exist(const unsigned char *s, const char *sub) +static inline int is_extension_exist(const unsigned char *s, const char *sub) { size_t slen = strlen(s); size_t sublen = strlen(sub); diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index d6e48a6487d5..18a038a2a9fa 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -454,7 +454,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, new_blkaddr == NULL_ADDR); f2fs_bug_on(sbi, nat_get_blkaddr(e) == NEW_ADDR && new_blkaddr == NEW_ADDR); - f2fs_bug_on(sbi, is_valid_data_blkaddr(sbi, nat_get_blkaddr(e)) && + f2fs_bug_on(sbi, __is_valid_data_blkaddr(nat_get_blkaddr(e)) && new_blkaddr == NEW_ADDR); /* increment version no as node is removed */ @@ -465,7 +465,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, /* change address */ nat_set_blkaddr(e, new_blkaddr); - if (!is_valid_data_blkaddr(sbi, new_blkaddr)) + if (!__is_valid_data_blkaddr(new_blkaddr)) set_nat_flag(e, IS_CHECKPOINTED, false); __set_nat_cache_dirty(nm_i, e); @@ -526,6 +526,7 @@ int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid, struct f2fs_nat_entry ne; struct nat_entry *e; pgoff_t index; + block_t blkaddr; int i; ni->nid = nid; @@ -569,6 +570,11 @@ int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid, node_info_from_raw_nat(ni, &ne); f2fs_put_page(page, 1); cache: + blkaddr = le32_to_cpu(ne.block_addr); + if (__is_valid_data_blkaddr(blkaddr) && + !f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC_ENHANCE)) + return -EFAULT; + /* cache nat entry */ cache_nat_entry(sbi, nid, &ne); return 0; @@ -600,9 +606,9 @@ static void f2fs_ra_node_pages(struct page *parent, int start, int n) pgoff_t f2fs_get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs) { const long direct_index = ADDRS_PER_INODE(dn->inode); - const long direct_blks = ADDRS_PER_BLOCK; - const long indirect_blks = ADDRS_PER_BLOCK * NIDS_PER_BLOCK; - unsigned int skipped_unit = ADDRS_PER_BLOCK; + const long direct_blks = ADDRS_PER_BLOCK(dn->inode); + const long indirect_blks = ADDRS_PER_BLOCK(dn->inode) * NIDS_PER_BLOCK; + unsigned int skipped_unit = ADDRS_PER_BLOCK(dn->inode); int cur_level = dn->cur_level; int max_level = dn->max_level; pgoff_t base = 0; @@ -638,9 +644,9 @@ static int get_node_path(struct inode *inode, long block, int offset[4], unsigned int noffset[4]) { const long direct_index = ADDRS_PER_INODE(inode); - const long direct_blks = ADDRS_PER_BLOCK; + const long direct_blks = ADDRS_PER_BLOCK(inode); const long dptrs_per_blk = NIDS_PER_BLOCK; - const long indirect_blks = ADDRS_PER_BLOCK * NIDS_PER_BLOCK; + const long indirect_blks = ADDRS_PER_BLOCK(inode) * NIDS_PER_BLOCK; const long dindirect_blks = indirect_blks * NIDS_PER_BLOCK; int n = 0; int level = 0; @@ -1181,8 +1187,14 @@ int f2fs_remove_inode_page(struct inode *inode) f2fs_put_dnode(&dn); return -EIO; } - f2fs_bug_on(F2FS_I_SB(inode), - inode->i_blocks != 0 && inode->i_blocks != 8); + + if (unlikely(inode->i_blocks != 0 && inode->i_blocks != 8)) { + f2fs_msg(F2FS_I_SB(inode)->sb, KERN_WARNING, + "Inconsistent i_blocks, ino:%lu, iblocks:%llu", + inode->i_ino, + (unsigned long long)inode->i_blocks); + set_sbi_flag(F2FS_I_SB(inode), SBI_NEED_FSCK); + } /* will put inode & node pages */ err = truncate_node(&dn); @@ -1277,9 +1289,10 @@ static int read_node_page(struct page *page, int op_flags) int err; if (PageUptodate(page)) { -#ifdef CONFIG_F2FS_CHECK_FS - f2fs_bug_on(sbi, !f2fs_inode_chksum_verify(sbi, page)); -#endif + if (!f2fs_inode_chksum_verify(sbi, page)) { + ClearPageUptodate(page); + return -EBADMSG; + } return LOCKED_PAGE; } @@ -1543,7 +1556,8 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted, } if (__is_valid_data_blkaddr(ni.blk_addr) && - !f2fs_is_valid_blkaddr(sbi, ni.blk_addr, DATA_GENERIC)) { + !f2fs_is_valid_blkaddr(sbi, ni.blk_addr, + DATA_GENERIC_ENHANCE)) { up_read(&sbi->node_write); goto redirty_out; } @@ -2078,6 +2092,9 @@ static bool add_free_nid(struct f2fs_sb_info *sbi, if (unlikely(nid == 0)) return false; + if (unlikely(f2fs_check_nid_range(sbi, nid))) + return false; + i = f2fs_kmem_cache_alloc(free_nid_slab, GFP_NOFS); i->nid = nid; i->state = FREE_NID; diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c index e3883db868d8..e04f82b3f4fc 100644 --- a/fs/f2fs/recovery.c +++ b/fs/f2fs/recovery.c @@ -325,8 +325,10 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head, break; } - if (!is_recoverable_dnode(page)) + if (!is_recoverable_dnode(page)) { + f2fs_put_page(page, 1); break; + } if (!is_fsync_dnode(page)) goto next; @@ -338,8 +340,10 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head, if (!check_only && IS_INODE(page) && is_dent_dnode(page)) { err = f2fs_recover_inode_page(sbi, page); - if (err) + if (err) { + f2fs_put_page(page, 1); break; + } quota_inode = true; } @@ -355,6 +359,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head, err = 0; goto next; } + f2fs_put_page(page, 1); break; } } @@ -370,6 +375,7 @@ next: "%s: detect looped node chain, " "blkaddr:%u, next:%u", __func__, blkaddr, next_blkaddr_of_node(page)); + f2fs_put_page(page, 1); err = -EINVAL; break; } @@ -380,7 +386,6 @@ next: f2fs_ra_meta_pages_cond(sbi, blkaddr); } - f2fs_put_page(page, 1); return err; } @@ -546,7 +551,15 @@ retry_dn: goto err; f2fs_bug_on(sbi, ni.ino != ino_of_node(page)); - f2fs_bug_on(sbi, ofs_of_node(dn.node_page) != ofs_of_node(page)); + + if (ofs_of_node(dn.node_page) != ofs_of_node(page)) { + f2fs_msg(sbi->sb, KERN_WARNING, + "Inconsistent ofs_of_node, ino:%lu, ofs:%u, %u", + inode->i_ino, ofs_of_node(dn.node_page), + ofs_of_node(page)); + err = -EFAULT; + goto err; + } for (; start < end; start++, dn.ofs_in_node++) { block_t src, dest; @@ -554,6 +567,18 @@ retry_dn: src = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node); dest = datablock_addr(dn.inode, page, dn.ofs_in_node); + if (__is_valid_data_blkaddr(src) && + !f2fs_is_valid_blkaddr(sbi, src, META_POR)) { + err = -EFAULT; + goto err; + } + + if (__is_valid_data_blkaddr(dest) && + !f2fs_is_valid_blkaddr(sbi, dest, META_POR)) { + err = -EFAULT; + goto err; + } + /* skip recovering if dest is the same as src */ if (src == dest) continue; @@ -666,8 +691,10 @@ static int recover_data(struct f2fs_sb_info *sbi, struct list_head *inode_list, */ if (IS_INODE(page)) { err = recover_inode(entry->inode, page); - if (err) + if (err) { + f2fs_put_page(page, 1); break; + } } if (entry->last_dentry == blkaddr) { err = recover_dentry(entry->inode, page, dir_list); diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index aa7fe79b62b2..8dee063c833f 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -580,7 +580,7 @@ static int submit_flush_wait(struct f2fs_sb_info *sbi, nid_t ino) int ret = 0; int i; - if (!sbi->s_ndevs) + if (!f2fs_is_multi_device(sbi)) return __submit_flush_wait(sbi, sbi->sb->s_bdev); for (i = 0; i < sbi->s_ndevs; i++) { @@ -648,7 +648,8 @@ int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino) return ret; } - if (atomic_inc_return(&fcc->queued_flush) == 1 || sbi->s_ndevs > 1) { + if (atomic_inc_return(&fcc->queued_flush) == 1 || + f2fs_is_multi_device(sbi)) { ret = submit_flush_wait(sbi, ino); atomic_dec(&fcc->queued_flush); @@ -754,7 +755,7 @@ int f2fs_flush_device_cache(struct f2fs_sb_info *sbi) { int ret = 0, i; - if (!sbi->s_ndevs) + if (!f2fs_is_multi_device(sbi)) return 0; for (i = 1; i < sbi->s_ndevs; i++) { @@ -1367,9 +1368,12 @@ static int __queue_discard_cmd(struct f2fs_sb_info *sbi, { block_t lblkstart = blkstart; + if (!f2fs_bdev_support_discard(bdev)) + return 0; + trace_f2fs_queue_discard(bdev, blkstart, blklen); - if (sbi->s_ndevs) { + if (f2fs_is_multi_device(sbi)) { int devi = f2fs_target_device_index(sbi, blkstart); blkstart -= FDEV(devi).start_blk; @@ -1732,42 +1736,36 @@ static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi, block_t lblkstart = blkstart; int devi = 0; - if (sbi->s_ndevs) { + if (f2fs_is_multi_device(sbi)) { devi = f2fs_target_device_index(sbi, blkstart); + if (blkstart < FDEV(devi).start_blk || + blkstart > FDEV(devi).end_blk) { + f2fs_msg(sbi->sb, KERN_ERR, "Invalid block %x", + blkstart); + return -EIO; + } blkstart -= FDEV(devi).start_blk; } - /* - * We need to know the type of the zone: for conventional zones, - * use regular discard if the drive supports it. For sequential - * zones, reset the zone write pointer. - */ - switch (get_blkz_type(sbi, bdev, blkstart)) { - - case BLK_ZONE_TYPE_CONVENTIONAL: - if (!blk_queue_discard(bdev_get_queue(bdev))) - return 0; - return __queue_discard_cmd(sbi, bdev, lblkstart, blklen); - case BLK_ZONE_TYPE_SEQWRITE_REQ: - case BLK_ZONE_TYPE_SEQWRITE_PREF: + /* For sequential zones, reset the zone write pointer */ + if (f2fs_blkz_is_seq(sbi, devi, blkstart)) { sector = SECTOR_FROM_BLOCK(blkstart); nr_sects = SECTOR_FROM_BLOCK(blklen); if (sector & (bdev_zone_sectors(bdev) - 1) || nr_sects != bdev_zone_sectors(bdev)) { - f2fs_msg(sbi->sb, KERN_INFO, - "(%d) %s: Unaligned discard attempted (block %x + %x)", + f2fs_msg(sbi->sb, KERN_ERR, + "(%d) %s: Unaligned zone reset attempted (block %x + %x)", devi, sbi->s_ndevs ? FDEV(devi).path: "", blkstart, blklen); return -EIO; } trace_f2fs_issue_reset_zone(bdev, blkstart); - return blkdev_reset_zones(bdev, sector, - nr_sects, GFP_NOFS); - default: - /* Unknown zone type: broken device ? */ - return -EIO; + return blkdev_reset_zones(bdev, sector, nr_sects, GFP_NOFS); } + + /* For conventional zones, use regular discard if supported */ + return __queue_discard_cmd(sbi, bdev, lblkstart, blklen); } #endif @@ -1775,8 +1773,7 @@ static int __issue_discard_async(struct f2fs_sb_info *sbi, struct block_device *bdev, block_t blkstart, block_t blklen) { #ifdef CONFIG_BLK_DEV_ZONED - if (f2fs_sb_has_blkzoned(sbi) && - bdev_zoned_model(bdev) != BLK_ZONED_NONE) + if (f2fs_sb_has_blkzoned(sbi) && bdev_is_zoned(bdev)) return __f2fs_issue_discard_zone(sbi, bdev, blkstart, blklen); #endif return __queue_discard_cmd(sbi, bdev, blkstart, blklen); @@ -2172,8 +2169,11 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del) * before, we must track that to know how much space we * really have. */ - if (f2fs_test_bit(offset, se->ckpt_valid_map)) + if (f2fs_test_bit(offset, se->ckpt_valid_map)) { + spin_lock(&sbi->stat_lock); sbi->unusable_block_count++; + spin_unlock(&sbi->stat_lock); + } } if (f2fs_test_and_clear_bit(offset, se->discard_map)) @@ -2220,7 +2220,7 @@ bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr) struct seg_entry *se; bool is_cp = false; - if (!is_valid_data_blkaddr(sbi, blkaddr)) + if (!__is_valid_data_blkaddr(blkaddr)) return true; down_read(&sit_i->sentry_lock); @@ -3089,7 +3089,7 @@ static void update_device_state(struct f2fs_io_info *fio) struct f2fs_sb_info *sbi = fio->sbi; unsigned int devidx; - if (!sbi->s_ndevs) + if (!f2fs_is_multi_device(sbi)) return; devidx = f2fs_target_device_index(sbi, fio->new_blkaddr); @@ -3187,13 +3187,18 @@ int f2fs_inplace_write_data(struct f2fs_io_info *fio) { int err; struct f2fs_sb_info *sbi = fio->sbi; + unsigned int segno; fio->new_blkaddr = fio->old_blkaddr; /* i/o temperature is needed for passing down write hints */ __get_segment_type(fio); - f2fs_bug_on(sbi, !IS_DATASEG(get_seg_entry(sbi, - GET_SEGNO(sbi, fio->new_blkaddr))->type)); + segno = GET_SEGNO(sbi, fio->new_blkaddr); + + if (!IS_DATASEG(get_seg_entry(sbi, segno)->type)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + return -EFAULT; + } stat_inc_inplace_blocks(fio->sbi); @@ -3336,7 +3341,7 @@ void f2fs_wait_on_block_writeback(struct inode *inode, block_t blkaddr) if (!f2fs_post_read_required(inode)) return; - if (!is_valid_data_blkaddr(sbi, blkaddr)) + if (!__is_valid_data_blkaddr(blkaddr)) return; cpage = find_lock_page(META_MAPPING(sbi), blkaddr); diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 5c7ed0442d6e..429007b8036e 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -82,7 +82,7 @@ (GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & ((sbi)->blocks_per_seg - 1)) #define GET_SEGNO(sbi, blk_addr) \ - ((!is_valid_data_blkaddr(sbi, blk_addr)) ? \ + ((!__is_valid_data_blkaddr(blk_addr)) ? \ NULL_SEGNO : GET_L2R_SEGNO(FREE_I(sbi), \ GET_SEGNO_FROM_SEG0(sbi, blk_addr))) #define BLKS_PER_SEC(sbi) \ @@ -656,14 +656,15 @@ static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int segno) f2fs_bug_on(sbi, segno > TOTAL_SEGS(sbi) - 1); } -static inline void verify_block_addr(struct f2fs_io_info *fio, block_t blk_addr) +static inline void verify_fio_blkaddr(struct f2fs_io_info *fio) { struct f2fs_sb_info *sbi = fio->sbi; - if (__is_meta_io(fio)) - verify_blkaddr(sbi, blk_addr, META_GENERIC); - else - verify_blkaddr(sbi, blk_addr, DATA_GENERIC); + if (__is_valid_data_blkaddr(fio->old_blkaddr)) + verify_blkaddr(sbi, fio->old_blkaddr, __is_meta_io(fio) ? + META_GENERIC : DATA_GENERIC); + verify_blkaddr(sbi, fio->new_blkaddr, __is_meta_io(fio) ? + META_GENERIC : DATA_GENERIC_ENHANCE); } /* @@ -672,7 +673,6 @@ static inline void verify_block_addr(struct f2fs_io_info *fio, block_t blk_addr) static inline int check_block_count(struct f2fs_sb_info *sbi, int segno, struct f2fs_sit_entry *raw_sit) { -#ifdef CONFIG_F2FS_CHECK_FS bool is_valid = test_bit_le(0, raw_sit->valid_map) ? true : false; int valid_blocks = 0; int cur_pos = 0, next_pos; @@ -699,7 +699,7 @@ static inline int check_block_count(struct f2fs_sb_info *sbi, set_sbi_flag(sbi, SBI_NEED_FSCK); return -EINVAL; } -#endif + /* check segment usage, and check boundary of a given segment number */ if (unlikely(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg || segno > TOTAL_SEGS(sbi) - 1)) { diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 4c55d2ea9df3..6b959bbb336a 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1019,7 +1019,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi) for (i = 0; i < sbi->s_ndevs; i++) { blkdev_put(FDEV(i).bdev, FMODE_EXCL); #ifdef CONFIG_BLK_DEV_ZONED - kvfree(FDEV(i).blkz_type); + kvfree(FDEV(i).blkz_seq); #endif } kvfree(sbi->devs); @@ -1221,10 +1221,13 @@ static int f2fs_statfs(struct dentry *dentry, struct kstatfs *buf) buf->f_blocks = total_count - start_count; buf->f_bfree = user_block_count - valid_user_blocks(sbi) - sbi->current_reserved_blocks; + + spin_lock(&sbi->stat_lock); if (unlikely(buf->f_bfree <= sbi->unusable_block_count)) buf->f_bfree = 0; else buf->f_bfree -= sbi->unusable_block_count; + spin_unlock(&sbi->stat_lock); if (buf->f_bfree > F2FS_OPTION(sbi).root_reserved_blocks) buf->f_bavail = buf->f_bfree - @@ -1499,9 +1502,15 @@ static int f2fs_disable_checkpoint(struct f2fs_sb_info *sbi) mutex_lock(&sbi->gc_mutex); cpc.reason = CP_PAUSE; set_sbi_flag(sbi, SBI_CP_DISABLED); - f2fs_write_checkpoint(sbi, &cpc); + err = f2fs_write_checkpoint(sbi, &cpc); + if (err) + goto out_unlock; + spin_lock(&sbi->stat_lock); sbi->unusable_block_count = 0; + spin_unlock(&sbi->stat_lock); + +out_unlock: mutex_unlock(&sbi->gc_mutex); restore_flag: sbi->sb->s_flags = s_flags; /* Restore MS_RDONLY status */ @@ -2271,7 +2280,7 @@ static const struct export_operations f2fs_export_ops = { static loff_t max_file_blocks(void) { loff_t result = 0; - loff_t leaf_count = ADDRS_PER_BLOCK; + loff_t leaf_count = DEF_ADDRS_PER_BLOCK; /* * note: previously, result is equal to (DEF_ADDRS_PER_INODE - @@ -2449,7 +2458,7 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi, /* Currently, support only 4KB page cache size */ if (F2FS_BLKSIZE != PAGE_SIZE) { f2fs_msg(sb, KERN_INFO, - "Invalid page_cache_size (%lu), supports only 4KB\n", + "Invalid page_cache_size (%lu), supports only 4KB", PAGE_SIZE); return 1; } @@ -2458,7 +2467,7 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi, blocksize = 1 << le32_to_cpu(raw_super->log_blocksize); if (blocksize != F2FS_BLKSIZE) { f2fs_msg(sb, KERN_INFO, - "Invalid blocksize (%u), supports only 4KB\n", + "Invalid blocksize (%u), supports only 4KB", blocksize); return 1; } @@ -2466,7 +2475,7 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi, /* check log blocks per segment */ if (le32_to_cpu(raw_super->log_blocks_per_seg) != 9) { f2fs_msg(sb, KERN_INFO, - "Invalid log blocks per segment (%u)\n", + "Invalid log blocks per segment (%u)", le32_to_cpu(raw_super->log_blocks_per_seg)); return 1; } @@ -2587,7 +2596,8 @@ int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi) unsigned int log_blocks_per_seg; unsigned int segment_count_main; unsigned int cp_pack_start_sum, cp_payload; - block_t user_block_count; + block_t user_block_count, valid_user_blocks; + block_t avail_node_count, valid_node_count; int i, j; total = le32_to_cpu(raw_super->segment_count); @@ -2622,6 +2632,24 @@ int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi) return 1; } + valid_user_blocks = le64_to_cpu(ckpt->valid_block_count); + if (valid_user_blocks > user_block_count) { + f2fs_msg(sbi->sb, KERN_ERR, + "Wrong valid_user_blocks: %u, user_block_count: %u", + valid_user_blocks, user_block_count); + return 1; + } + + valid_node_count = le32_to_cpu(ckpt->valid_node_count); + avail_node_count = sbi->total_node_count - sbi->nquota_files - + F2FS_RESERVED_NODE_NUM; + if (valid_node_count > avail_node_count) { + f2fs_msg(sbi->sb, KERN_ERR, + "Wrong valid_node_count: %u, avail_node_count: %u", + valid_node_count, avail_node_count); + return 1; + } + main_segs = le32_to_cpu(raw_super->segment_count_main); blocks_per_seg = sbi->blocks_per_seg; @@ -2793,9 +2821,11 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi) if (nr_sectors & (bdev_zone_sectors(bdev) - 1)) FDEV(devi).nr_blkz++; - FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz, - GFP_KERNEL); - if (!FDEV(devi).blkz_type) + FDEV(devi).blkz_seq = f2fs_kzalloc(sbi, + BITS_TO_LONGS(FDEV(devi).nr_blkz) + * sizeof(unsigned long), + GFP_KERNEL); + if (!FDEV(devi).blkz_seq) return -ENOMEM; #define F2FS_REPORT_NR_ZONES 4096 @@ -2822,7 +2852,8 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi) } for (i = 0; i < nr_zones; i++) { - FDEV(devi).blkz_type[n] = zones[i].type; + if (zones[i].type != BLK_ZONE_TYPE_CONVENTIONAL) + set_bit(n, FDEV(devi).blkz_seq); sector += zones[i].len; n++; } @@ -3105,7 +3136,7 @@ try_onemore: #ifndef CONFIG_BLK_DEV_ZONED if (f2fs_sb_has_blkzoned(sbi)) { f2fs_msg(sb, KERN_ERR, - "Zoned block device support is not enabled\n"); + "Zoned block device support is not enabled"); err = -EOPNOTSUPP; goto free_sb_buf; } @@ -3350,10 +3381,17 @@ try_onemore: * mount should be failed, when device has readonly mode, and * previous checkpoint was not done by clean system shutdown. */ - if (bdev_read_only(sb->s_bdev) && - !is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { - err = -EROFS; - goto free_meta; + if (f2fs_hw_is_readonly(sbi)) { + if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { + err = -EROFS; + f2fs_msg(sb, KERN_ERR, + "Need to recover fsync data, but " + "write access unavailable"); + goto free_meta; + } + f2fs_msg(sbi->sb, KERN_INFO, "write access " + "unavailable, skipping recovery"); + goto reset_checkpoint; } if (need_fsck) diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c index 848a785abe25..e791741d193b 100644 --- a/fs/f2fs/xattr.c +++ b/fs/f2fs/xattr.c @@ -202,12 +202,17 @@ static inline const struct xattr_handler *f2fs_xattr_handler(int index) return handler; } -static struct f2fs_xattr_entry *__find_xattr(void *base_addr, int index, - size_t len, const char *name) +static struct f2fs_xattr_entry *__find_xattr(void *base_addr, + void *last_base_addr, int index, + size_t len, const char *name) { struct f2fs_xattr_entry *entry; list_for_each_xattr(entry, base_addr) { + if ((void *)(entry) + sizeof(__u32) > last_base_addr || + (void *)XATTR_NEXT_ENTRY(entry) > last_base_addr) + return NULL; + if (entry->e_name_index != index) continue; if (entry->e_name_len != len) @@ -297,20 +302,22 @@ static int lookup_all_xattrs(struct inode *inode, struct page *ipage, const char *name, struct f2fs_xattr_entry **xe, void **base_addr, int *base_size) { - void *cur_addr, *txattr_addr, *last_addr = NULL; + void *cur_addr, *txattr_addr, *last_txattr_addr; + void *last_addr = NULL; nid_t xnid = F2FS_I(inode)->i_xattr_nid; - unsigned int size = xnid ? VALID_XATTR_BLOCK_SIZE : 0; unsigned int inline_size = inline_xattr_size(inode); int err = 0; - if (!size && !inline_size) + if (!xnid && !inline_size) return -ENODATA; - *base_size = inline_size + size + XATTR_PADDING_SIZE; + *base_size = XATTR_SIZE(xnid, inode) + XATTR_PADDING_SIZE; txattr_addr = f2fs_kzalloc(F2FS_I_SB(inode), *base_size, GFP_NOFS); if (!txattr_addr) return -ENOMEM; + last_txattr_addr = (void *)txattr_addr + XATTR_SIZE(xnid, inode); + /* read from inline xattr */ if (inline_size) { err = read_inline_xattr(inode, ipage, txattr_addr); @@ -337,7 +344,11 @@ static int lookup_all_xattrs(struct inode *inode, struct page *ipage, else cur_addr = txattr_addr; - *xe = __find_xattr(cur_addr, index, len, name); + *xe = __find_xattr(cur_addr, last_txattr_addr, index, len, name); + if (!*xe) { + err = -EFAULT; + goto out; + } check: if (IS_XATTR_LAST_ENTRY(*xe)) { err = -ENODATA; @@ -581,7 +592,8 @@ static int __f2fs_setxattr(struct inode *inode, int index, struct page *ipage, int flags) { struct f2fs_xattr_entry *here, *last; - void *base_addr; + void *base_addr, *last_base_addr; + nid_t xnid = F2FS_I(inode)->i_xattr_nid; int found, newsize; size_t len; __u32 new_hsize; @@ -605,8 +617,14 @@ static int __f2fs_setxattr(struct inode *inode, int index, if (error) return error; + last_base_addr = (void *)base_addr + XATTR_SIZE(xnid, inode); + /* find entry with wanted name. */ - here = __find_xattr(base_addr, index, len, name); + here = __find_xattr(base_addr, last_base_addr, index, len, name); + if (!here) { + error = -EFAULT; + goto exit; + } found = IS_XATTR_LAST_ENTRY(here) ? 0 : 1; diff --git a/fs/f2fs/xattr.h b/fs/f2fs/xattr.h index 9172ee082ca8..a90920e2f949 100644 --- a/fs/f2fs/xattr.h +++ b/fs/f2fs/xattr.h @@ -71,6 +71,8 @@ struct f2fs_xattr_entry { entry = XATTR_NEXT_ENTRY(entry)) #define VALID_XATTR_BLOCK_SIZE (PAGE_SIZE - sizeof(struct node_footer)) #define XATTR_PADDING_SIZE (sizeof(__u32)) +#define XATTR_SIZE(x,i) (((x) ? VALID_XATTR_BLOCK_SIZE : 0) + \ + (inline_xattr_size(i))) #define MIN_OFFSET(i) XATTR_ALIGN(inline_xattr_size(i) + \ VALID_XATTR_BLOCK_SIZE) diff --git a/fs/fuse/control.c b/fs/fuse/control.c index fe80bea4ad89..14ce1e47f980 100644 --- a/fs/fuse/control.c +++ b/fs/fuse/control.c @@ -10,6 +10,7 @@ #include <linux/init.h> #include <linux/module.h> +#include <linux/fs_context.h> #define FUSE_CTL_SUPER_MAGIC 0x65735543 @@ -317,7 +318,7 @@ void fuse_ctl_remove_conn(struct fuse_conn *fc) drop_nlink(d_inode(fuse_control_sb->s_root)); } -static int fuse_ctl_fill_super(struct super_block *sb, void *data, int silent) +static int fuse_ctl_fill_super(struct super_block *sb, struct fs_context *fctx) { static const struct tree_descr empty_descr = {""}; struct fuse_conn *fc; @@ -343,10 +344,19 @@ static int fuse_ctl_fill_super(struct super_block *sb, void *data, int silent) return 0; } -static struct dentry *fuse_ctl_mount(struct file_system_type *fs_type, - int flags, const char *dev_name, void *raw_data) +static int fuse_ctl_get_tree(struct fs_context *fc) { - return mount_single(fs_type, flags, raw_data, fuse_ctl_fill_super); + return vfs_get_super(fc, vfs_get_single_super, fuse_ctl_fill_super); +} + +static const struct fs_context_operations fuse_ctl_context_ops = { + .get_tree = fuse_ctl_get_tree, +}; + +static int fuse_ctl_init_fs_context(struct fs_context *fc) +{ + fc->ops = &fuse_ctl_context_ops; + return 0; } static void fuse_ctl_kill_sb(struct super_block *sb) @@ -365,7 +375,7 @@ static void fuse_ctl_kill_sb(struct super_block *sb) static struct file_system_type fuse_ctl_fs_type = { .owner = THIS_MODULE, .name = "fusectl", - .mount = fuse_ctl_mount, + .init_fs_context = fuse_ctl_init_fs_context, .kill_sb = fuse_ctl_kill_sb, }; MODULE_ALIAS_FS("fusectl"); diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 55a26f351467..4b41df1d4642 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -33,6 +33,8 @@ * closed. */ +#define pr_fmt(fmt) "CUSE: " fmt + #include <linux/fuse.h> #include <linux/cdev.h> #include <linux/device.h> @@ -225,7 +227,7 @@ static int cuse_parse_one(char **pp, char *end, char **keyp, char **valp) return 0; if (end[-1] != '\0') { - printk(KERN_ERR "CUSE: info not properly terminated\n"); + pr_err("info not properly terminated\n"); return -EINVAL; } @@ -242,7 +244,7 @@ static int cuse_parse_one(char **pp, char *end, char **keyp, char **valp) key = strstrip(key); if (!strlen(key)) { - printk(KERN_ERR "CUSE: zero length info key specified\n"); + pr_err("zero length info key specified\n"); return -EINVAL; } @@ -282,12 +284,11 @@ static int cuse_parse_devinfo(char *p, size_t len, struct cuse_devinfo *devinfo) if (strcmp(key, "DEVNAME") == 0) devinfo->name = val; else - printk(KERN_WARNING "CUSE: unknown device info \"%s\"\n", - key); + pr_warn("unknown device info \"%s\"\n", key); } if (!devinfo->name || !strlen(devinfo->name)) { - printk(KERN_ERR "CUSE: DEVNAME unspecified\n"); + pr_err("DEVNAME unspecified\n"); return -EINVAL; } @@ -341,7 +342,7 @@ static void cuse_process_init_reply(struct fuse_conn *fc, struct fuse_req *req) else rc = register_chrdev_region(devt, 1, devinfo.name); if (rc) { - printk(KERN_ERR "CUSE: failed to register chrdev region\n"); + pr_err("failed to register chrdev region\n"); goto err; } diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 9971a35cf1ef..24ea19cfe07e 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -906,8 +906,8 @@ static int fuse_check_page(struct page *page) 1 << PG_lru | 1 << PG_active | 1 << PG_reclaim))) { - printk(KERN_WARNING "fuse: trying to steal weird page\n"); - printk(KERN_WARNING " page=%p index=%li flags=%08lx, count=%i, mapcount=%i, mapping=%p\n", page, page->index, page->flags, page_count(page), page_mapcount(page), page->mapping); + pr_warn("trying to steal weird page\n"); + pr_warn(" page=%p index=%li flags=%08lx, count=%i, mapcount=%i, mapping=%p\n", page, page->index, page->flags, page_count(page), page_mapcount(page), page->mapping); return 1; } return 0; @@ -1317,6 +1317,16 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, unsigned reqsize; unsigned int hash; + /* + * Require sane minimum read buffer - that has capacity for fixed part + * of any request header + negotated max_write room for data. If the + * requirement is not satisfied return EINVAL to the filesystem server + * to indicate that it is not following FUSE server/client contract. + * Don't dequeue / abort any request. + */ + if (nbytes < max_t(size_t, FUSE_MIN_READ_BUFFER, 4096 + fc->max_write)) + return -EINVAL; + restart: spin_lock(&fiq->waitq.lock); err = -EAGAIN; @@ -1749,7 +1759,7 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode, offset = outarg->offset & ~PAGE_MASK; file_size = i_size_read(inode); - num = outarg->size; + num = min(outarg->size, fc->max_write); if (outarg->offset > file_size) num = 0; else if (outarg->offset + num > file_size) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 06096b60f1df..3959f08279e6 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -178,7 +178,9 @@ void fuse_finish_open(struct inode *inode, struct file *file) if (!(ff->open_flags & FOPEN_KEEP_CACHE)) invalidate_inode_pages2(inode->i_mapping); - if (ff->open_flags & FOPEN_NONSEEKABLE) + if (ff->open_flags & FOPEN_STREAM) + stream_open(inode, file); + else if (ff->open_flags & FOPEN_NONSEEKABLE) nonseekable_open(inode, file); if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) { struct fuse_inode *fi = get_fuse_inode(inode); @@ -462,7 +464,7 @@ int fuse_fsync_common(struct file *file, loff_t start, loff_t end, memset(&inarg, 0, sizeof(inarg)); inarg.fh = ff->fh; - inarg.fsync_flags = datasync ? 1 : 0; + inarg.fsync_flags = datasync ? FUSE_FSYNC_FDATASYNC : 0; args.in.h.opcode = opcode; args.in.h.nodeid = get_node_id(inode); args.in.numargs = 1; @@ -1586,7 +1588,7 @@ __acquires(fi->lock) { struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); - size_t crop = i_size_read(inode); + loff_t crop = i_size_read(inode); struct fuse_req *req; while (fi->writectr >= 0 && !list_empty(&fi->queued_writes)) { @@ -2576,8 +2578,13 @@ long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, #if BITS_PER_LONG == 32 inarg.flags |= FUSE_IOCTL_32BIT; #else - if (flags & FUSE_IOCTL_COMPAT) + if (flags & FUSE_IOCTL_COMPAT) { inarg.flags |= FUSE_IOCTL_32BIT; +#ifdef CONFIG_X86_X32 + if (in_x32_syscall()) + inarg.flags |= FUSE_IOCTL_COMPAT_X32; +#endif + } #endif /* assume all the iovs returned by client always fits in a page */ @@ -3044,6 +3051,13 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, } } + if (!(mode & FALLOC_FL_KEEP_SIZE) && + offset + length > i_size_read(inode)) { + err = inode_newsize_ok(inode, offset + length); + if (err) + return err; + } + if (!(mode & FALLOC_FL_KEEP_SIZE)) set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 0920c0c032a0..24dbca777775 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -9,6 +9,10 @@ #ifndef _FS_FUSE_I_H #define _FS_FUSE_I_H +#ifndef pr_fmt +# define pr_fmt(fmt) "fuse: " fmt +#endif + #include <linux/fuse.h> #include <linux/fs.h> #include <linux/mount.h> @@ -690,6 +694,9 @@ struct fuse_conn { /** Use enhanced/automatic page cache invalidation. */ unsigned auto_inval_data:1; + /** Filesystem is fully reponsible for page cache invalidation. */ + unsigned explicit_inval_data:1; + /** Does the filesystem support readdirplus? */ unsigned do_readdirplus:1; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index f485d09d14df..4bb885b0f032 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -81,14 +81,12 @@ struct fuse_forget_link *fuse_alloc_forget(void) static struct inode *fuse_alloc_inode(struct super_block *sb) { - struct inode *inode; struct fuse_inode *fi; - inode = kmem_cache_alloc(fuse_inode_cachep, GFP_KERNEL); - if (!inode) + fi = kmem_cache_alloc(fuse_inode_cachep, GFP_KERNEL); + if (!fi) return NULL; - fi = get_fuse_inode(inode); fi->i_time = 0; fi->inval_mask = 0; fi->nodeid = 0; @@ -100,11 +98,11 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) spin_lock_init(&fi->lock); fi->forget = fuse_alloc_forget(); if (!fi->forget) { - kmem_cache_free(fuse_inode_cachep, inode); + kmem_cache_free(fuse_inode_cachep, fi); return NULL; } - return inode; + return &fi->inode; } static void fuse_free_inode(struct inode *inode) @@ -233,7 +231,8 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, if (oldsize != attr->size) { truncate_pagecache(inode, attr->size); - inval = true; + if (!fc->explicit_inval_data) + inval = true; } else if (fc->auto_inval_data) { struct timespec64 new_mtime = { .tv_sec = attr->mtime, @@ -908,6 +907,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req) fc->dont_mask = 1; if (arg->flags & FUSE_AUTO_INVAL_DATA) fc->auto_inval_data = 1; + else if (arg->flags & FUSE_EXPLICIT_INVAL_DATA) + fc->explicit_inval_data = 1; if (arg->flags & FUSE_DO_READDIRPLUS) { fc->do_readdirplus = 1; if (arg->flags & FUSE_READDIRPLUS_AUTO) @@ -969,7 +970,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) FUSE_WRITEBACK_CACHE | FUSE_NO_OPEN_SUPPORT | FUSE_PARALLEL_DIROPS | FUSE_HANDLE_KILLPRIV | FUSE_POSIX_ACL | FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS | - FUSE_NO_OPENDIR_SUPPORT; + FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA; req->in.h.opcode = FUSE_INIT; req->in.numargs = 1; req->in.args[0].size = sizeof(*arg); @@ -1393,8 +1394,8 @@ static int __init fuse_init(void) { int res; - printk(KERN_INFO "fuse init (API version %i.%i)\n", - FUSE_KERNEL_VERSION, FUSE_KERNEL_MINOR_VERSION); + pr_info("init (API version %i.%i)\n", + FUSE_KERNEL_VERSION, FUSE_KERNEL_MINOR_VERSION); INIT_LIST_HEAD(&fuse_conn_list); res = fuse_fs_init(); @@ -1430,7 +1431,7 @@ static int __init fuse_init(void) static void __exit fuse_exit(void) { - printk(KERN_DEBUG "fuse exit\n"); + pr_debug("exit\n"); fuse_ctl_cleanup(); fuse_sysfs_cleanup(); diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c index 1787d295834e..08e4996adc23 100644 --- a/fs/gfs2/sys.c +++ b/fs/gfs2/sys.c @@ -650,7 +650,6 @@ int gfs2_sys_fs_add(struct gfs2_sbd *sdp) char ro[20]; char spectator[20]; char *envp[] = { ro, spectator, NULL }; - int sysfs_frees_sdp = 0; sprintf(ro, "RDONLY=%d", sb_rdonly(sb)); sprintf(spectator, "SPECTATOR=%d", sdp->sd_args.ar_spectator ? 1 : 0); @@ -661,8 +660,6 @@ int gfs2_sys_fs_add(struct gfs2_sbd *sdp) if (error) goto fail_reg; - sysfs_frees_sdp = 1; /* Freeing sdp is now done by sysfs calling - function gfs2_sbd_release. */ error = sysfs_create_group(&sdp->sd_kobj, &tune_group); if (error) goto fail_reg; @@ -687,10 +684,7 @@ fail_tune: fail_reg: free_percpu(sdp->sd_lkstats); fs_err(sdp, "error %d adding sysfs files\n", error); - if (sysfs_frees_sdp) - kobject_put(&sdp->sd_kobj); - else - kfree(sdp); + kobject_put(&sdp->sd_kobj); sb->s_fs_info = NULL; return error; } diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 5433e37fb0c5..8c7cbac7183c 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -108,6 +108,47 @@ void fsnotify_sb_delete(struct super_block *sb) } /* + * fsnotify_nameremove - a filename was removed from a directory + * + * This is mostly called under parent vfs inode lock so name and + * dentry->d_parent should be stable. However there are some corner cases where + * inode lock is not held. So to be on the safe side and be reselient to future + * callers and out of tree users of d_delete(), we do not assume that d_parent + * and d_name are stable and we use dget_parent() and + * take_dentry_name_snapshot() to grab stable references. + */ +void fsnotify_nameremove(struct dentry *dentry, int isdir) +{ + struct dentry *parent; + struct name_snapshot name; + __u32 mask = FS_DELETE; + + /* d_delete() of pseudo inode? (e.g. __ns_get_path() playing tricks) */ + if (IS_ROOT(dentry)) + return; + + if (isdir) + mask |= FS_ISDIR; + + parent = dget_parent(dentry); + /* Avoid unneeded take_dentry_name_snapshot() */ + if (!(d_inode(parent)->i_fsnotify_mask & FS_DELETE) && + !(dentry->d_sb->s_fsnotify_mask & FS_DELETE)) + goto out_dput; + + take_dentry_name_snapshot(&name, dentry); + + fsnotify(d_inode(parent), mask, d_inode(dentry), FSNOTIFY_EVENT_INODE, + &name.name, 0); + + release_dentry_name_snapshot(&name); + +out_dput: + dput(parent); +} +EXPORT_SYMBOL(fsnotify_nameremove); + +/* * Given an inode, first check if we care what happens to our children. Inotify * and dnotify both tell their parents about events. If we care about any event * on a child we run all of our children and set a dentry flag saying that the diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 22acb0a79b53..b251105f646f 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -619,6 +619,11 @@ restart: /* mark should be the last entry. last is the current last entry */ hlist_add_behind_rcu(&mark->obj_list, &last->obj_list); added: + /* + * Since connector is attached to object using cmpxchg() we are + * guaranteed that connector initialization is fully visible by anyone + * seeing mark->connector set. + */ WRITE_ONCE(mark->connector, conn); out_err: spin_unlock(&conn->lock); diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index 68b3303e4b46..56feaa739979 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -909,14 +909,14 @@ static bool ovl_open_need_copy_up(struct dentry *dentry, int flags) return true; } -int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags) +int ovl_maybe_copy_up(struct dentry *dentry, int flags) { int err = 0; - if (ovl_open_need_copy_up(dentry, file_flags)) { + if (ovl_open_need_copy_up(dentry, flags)) { err = ovl_want_write(dentry); if (!err) { - err = ovl_copy_up_flags(dentry, file_flags); + err = ovl_copy_up_flags(dentry, flags); ovl_drop_write(dentry); } } diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c index 82c129bfe58d..93872bb50230 100644 --- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -260,7 +260,7 @@ static int ovl_instantiate(struct dentry *dentry, struct inode *inode, * hashed directory inode aliases. */ inode = ovl_get_inode(dentry->d_sb, &oip); - if (WARN_ON(IS_ERR(inode))) + if (IS_ERR(inode)) return PTR_ERR(inode); } else { WARN_ON(ovl_inode_real(inode) != d_inode(newdentry)); diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 84dd957efa24..540a8b845145 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -11,6 +11,7 @@ #include <linux/mount.h> #include <linux/xattr.h> #include <linux/uio.h> +#include <linux/uaccess.h> #include "overlayfs.h" static char ovl_whatisit(struct inode *inode, struct inode *realinode) @@ -29,10 +30,11 @@ static struct file *ovl_open_realfile(const struct file *file, struct inode *inode = file_inode(file); struct file *realfile; const struct cred *old_cred; + int flags = file->f_flags | O_NOATIME | FMODE_NONOTIFY; old_cred = ovl_override_creds(inode->i_sb); - realfile = open_with_fake_path(&file->f_path, file->f_flags | O_NOATIME, - realinode, current_cred()); + realfile = open_with_fake_path(&file->f_path, flags, realinode, + current_cred()); revert_creds(old_cred); pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n", @@ -50,7 +52,7 @@ static int ovl_change_flags(struct file *file, unsigned int flags) int err; /* No atime modificaton on underlying */ - flags |= O_NOATIME; + flags |= O_NOATIME | FMODE_NONOTIFY; /* If some flag changed that cannot be changed then something's amiss */ if (WARN_ON((file->f_flags ^ flags) & ~OVL_SETFL_MASK)) @@ -116,11 +118,10 @@ static int ovl_real_fdget(const struct file *file, struct fd *real) static int ovl_open(struct inode *inode, struct file *file) { - struct dentry *dentry = file_dentry(file); struct file *realfile; int err; - err = ovl_open_maybe_copy_up(dentry, file->f_flags); + err = ovl_maybe_copy_up(file_dentry(file), file->f_flags); if (err) return err; @@ -145,11 +146,47 @@ static int ovl_release(struct inode *inode, struct file *file) static loff_t ovl_llseek(struct file *file, loff_t offset, int whence) { - struct inode *realinode = ovl_inode_real(file_inode(file)); + struct inode *inode = file_inode(file); + struct fd real; + const struct cred *old_cred; + ssize_t ret; + + /* + * The two special cases below do not need to involve real fs, + * so we can optimizing concurrent callers. + */ + if (offset == 0) { + if (whence == SEEK_CUR) + return file->f_pos; + + if (whence == SEEK_SET) + return vfs_setpos(file, 0, 0); + } + + ret = ovl_real_fdget(file, &real); + if (ret) + return ret; + + /* + * Overlay file f_pos is the master copy that is preserved + * through copy up and modified on read/write, but only real + * fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose + * limitations that are more strict than ->s_maxbytes for specific + * files, so we use the real file to perform seeks. + */ + inode_lock(inode); + real.file->f_pos = file->f_pos; + + old_cred = ovl_override_creds(inode->i_sb); + ret = vfs_llseek(real.file, offset, whence); + revert_creds(old_cred); + + file->f_pos = real.file->f_pos; + inode_unlock(inode); + + fdput(real); - return generic_file_llseek_size(file, offset, whence, - realinode->i_sb->s_maxbytes, - i_size_read(realinode)); + return ret; } static void ovl_file_accessed(struct file *file) @@ -372,10 +409,68 @@ static long ovl_real_ioctl(struct file *file, unsigned int cmd, return ret; } -static long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +static unsigned int ovl_get_inode_flags(struct inode *inode) +{ + unsigned int flags = READ_ONCE(inode->i_flags); + unsigned int ovl_iflags = 0; + + if (flags & S_SYNC) + ovl_iflags |= FS_SYNC_FL; + if (flags & S_APPEND) + ovl_iflags |= FS_APPEND_FL; + if (flags & S_IMMUTABLE) + ovl_iflags |= FS_IMMUTABLE_FL; + if (flags & S_NOATIME) + ovl_iflags |= FS_NOATIME_FL; + + return ovl_iflags; +} + +static long ovl_ioctl_set_flags(struct file *file, unsigned long arg) { long ret; struct inode *inode = file_inode(file); + unsigned int flags; + unsigned int old_flags; + + if (!inode_owner_or_capable(inode)) + return -EACCES; + + if (get_user(flags, (int __user *) arg)) + return -EFAULT; + + ret = mnt_want_write_file(file); + if (ret) + return ret; + + inode_lock(inode); + + /* Check the capability before cred override */ + ret = -EPERM; + old_flags = ovl_get_inode_flags(inode); + if (((flags ^ old_flags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) && + !capable(CAP_LINUX_IMMUTABLE)) + goto unlock; + + ret = ovl_maybe_copy_up(file_dentry(file), O_WRONLY); + if (ret) + goto unlock; + + ret = ovl_real_ioctl(file, FS_IOC_SETFLAGS, arg); + + ovl_copyflags(ovl_inode_real(inode), inode); +unlock: + inode_unlock(inode); + + mnt_drop_write_file(file); + + return ret; + +} + +static long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + long ret; switch (cmd) { case FS_IOC_GETFLAGS: @@ -383,23 +478,7 @@ static long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg) break; case FS_IOC_SETFLAGS: - if (!inode_owner_or_capable(inode)) - return -EACCES; - - ret = mnt_want_write_file(file); - if (ret) - return ret; - - ret = ovl_copy_up_with_data(file_dentry(file)); - if (!ret) { - ret = ovl_real_ioctl(file, cmd, arg); - - inode_lock(inode); - ovl_copyflags(ovl_inode_real(inode), inode); - inode_unlock(inode); - } - - mnt_drop_write_file(file); + ret = ovl_ioctl_set_flags(file, arg); break; default: diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index 3b7ed5d2279c..b48273e846ad 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -832,7 +832,7 @@ struct inode *ovl_get_inode(struct super_block *sb, int fsid = bylower ? oip->lowerpath->layer->fsid : 0; bool is_dir, metacopy = false; unsigned long ino = 0; - int err = -ENOMEM; + int err = oip->newinode ? -EEXIST : -ENOMEM; if (!realinode) realinode = d_inode(lowerdentry); @@ -917,6 +917,7 @@ out: return inode; out_err: + pr_warn_ratelimited("overlayfs: failed to get inode (%i)\n", err); inode = ERR_PTR(err); goto out; } diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 9c6018287d57..d26efed9f80a 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -421,7 +421,7 @@ extern const struct file_operations ovl_file_operations; int ovl_copy_up(struct dentry *dentry); int ovl_copy_up_with_data(struct dentry *dentry); int ovl_copy_up_flags(struct dentry *dentry, int flags); -int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags); +int ovl_maybe_copy_up(struct dentry *dentry, int flags); int ovl_copy_xattr(struct dentry *old, struct dentry *new); int ovl_set_attr(struct dentry *upper, struct kstat *stat); struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper); diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index fc20e06c56ba..9ad72ea7f71f 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -9,7 +9,7 @@ * on the Melbourne quota system as used on BSD derived systems. The internal * implementation is based on one of the several variants of the LINUX * inode-subsystem with added complexity of the diskquota system. - * + * * Author: Marco van Wieringen <mvw@planets.elm.net> * * Fixes: Dmitry Gorodchanin <pgmdsg@ibi.com>, 11 Feb 96 @@ -51,7 +51,7 @@ * Added journalled quota support, fix lock inversion problems * Jan Kara, <jack@suse.cz>, 2003,2004 * - * (C) Copyright 1994 - 1997 Marco van Wieringen + * (C) Copyright 1994 - 1997 Marco van Wieringen */ #include <linux/errno.h> @@ -197,7 +197,7 @@ static struct quota_format_type *find_quota_format(int id) int qm; spin_unlock(&dq_list_lock); - + for (qm = 0; module_names[qm].qm_fmt_id && module_names[qm].qm_fmt_id != id; qm++) ; @@ -424,10 +424,11 @@ int dquot_acquire(struct dquot *dquot) struct quota_info *dqopt = sb_dqopt(dquot->dq_sb); mutex_lock(&dquot->dq_lock); - if (!test_bit(DQ_READ_B, &dquot->dq_flags)) + if (!test_bit(DQ_READ_B, &dquot->dq_flags)) { ret = dqopt->ops[dquot->dq_id.type]->read_dqblk(dquot); - if (ret < 0) - goto out_iolock; + if (ret < 0) + goto out_iolock; + } /* Make sure flags update is visible after dquot has been filled */ smp_mb__before_atomic(); set_bit(DQ_READ_B, &dquot->dq_flags); @@ -1049,7 +1050,9 @@ static void remove_dquot_ref(struct super_block *sb, int type, struct list_head *tofree_head) { struct inode *inode; +#ifdef CONFIG_QUOTA_DEBUG int reserved = 0; +#endif spin_lock(&sb->s_inode_list_lock); list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { @@ -1061,8 +1064,10 @@ static void remove_dquot_ref(struct super_block *sb, int type, */ spin_lock(&dq_data_lock); if (!IS_NOQUOTA(inode)) { +#ifdef CONFIG_QUOTA_DEBUG if (unlikely(inode_get_rsv_space(inode) > 0)) reserved = 1; +#endif remove_inode_dquot_ref(inode, type, tofree_head); } spin_unlock(&dq_data_lock); @@ -1663,7 +1668,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags) for (cnt = 0; cnt < MAXQUOTAS; cnt++) { if (!dquots[cnt]) continue; - if (flags & DQUOT_SPACE_RESERVE) { + if (reserve) { ret = dquot_add_space(dquots[cnt], 0, number, flags, &warn[cnt]); } else { @@ -1676,13 +1681,11 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags) if (!dquots[cnt]) continue; spin_lock(&dquots[cnt]->dq_dqb_lock); - if (flags & DQUOT_SPACE_RESERVE) { - dquots[cnt]->dq_dqb.dqb_rsvspace -= - number; - } else { - dquots[cnt]->dq_dqb.dqb_curspace -= - number; - } + if (reserve) + dquot_free_reserved_space(dquots[cnt], + number); + else + dquot_decr_space(dquots[cnt], number); spin_unlock(&dquots[cnt]->dq_dqb_lock); } spin_unlock(&inode->i_lock); @@ -1733,7 +1736,7 @@ int dquot_alloc_inode(struct inode *inode) continue; /* Back out changes we already did */ spin_lock(&dquots[cnt]->dq_dqb_lock); - dquots[cnt]->dq_dqb.dqb_curinodes--; + dquot_decr_inodes(dquots[cnt], 1); spin_unlock(&dquots[cnt]->dq_dqb_lock); } goto warn_put_all; @@ -2397,7 +2400,7 @@ out_file_flags: out_fmt: put_quota_format(fmt); - return error; + return error; } /* Reenable quotas on remount RW */ @@ -2775,7 +2778,7 @@ int dquot_get_state(struct super_block *sb, struct qc_state *state) struct qc_type_state *tstate; struct quota_info *dqopt = sb_dqopt(sb); int type; - + memset(state, 0, sizeof(*state)); for (type = 0; type < MAXQUOTAS; type++) { if (!sb_has_quota_active(sb, type)) diff --git a/fs/quota/quota_v1.c b/fs/quota/quota_v1.c index 7ac5298aba70..9f2b2573b83c 100644 --- a/fs/quota/quota_v1.c +++ b/fs/quota/quota_v1.c @@ -127,7 +127,7 @@ static int v1_check_quota_file(struct super_block *sb, int type) { struct inode *inode = sb_dqopt(sb)->files[type]; ulong blocks; - size_t off; + size_t off; struct v2_disk_dqheader dqhead; ssize_t size; loff_t isize; diff --git a/fs/quota/quota_v2.c b/fs/quota/quota_v2.c index a73e5b34db41..3c30034e733f 100644 --- a/fs/quota/quota_v2.c +++ b/fs/quota/quota_v2.c @@ -78,7 +78,7 @@ static int v2_check_quota_file(struct super_block *sb, int type) struct v2_disk_dqheader dqhead; static const uint quota_magics[] = V2_INITQMAGICS; static const uint quota_versions[] = V2_INITQVERSIONS; - + if (v2_read_header(sb, type, &dqhead)) return 0; if (le32_to_cpu(dqhead.dqh_magic) != quota_magics[type] || diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c index 8a76f9d14bc6..36346dc4cec0 100644 --- a/fs/reiserfs/journal.c +++ b/fs/reiserfs/journal.c @@ -1844,7 +1844,7 @@ static int flush_used_journal_lists(struct super_block *s, * removes any nodes in table with name block and dev as bh. * only touchs the hnext and hprev pointers. */ -void remove_journal_hash(struct super_block *sb, +static void remove_journal_hash(struct super_block *sb, struct reiserfs_journal_cnode **table, struct reiserfs_journal_list *jl, unsigned long block, int remove_freed) diff --git a/fs/udf/namei.c b/fs/udf/namei.c index 58cc2414992b..77b6d89b9bcd 100644 --- a/fs/udf/namei.c +++ b/fs/udf/namei.c @@ -304,21 +304,6 @@ static struct dentry *udf_lookup(struct inode *dir, struct dentry *dentry, if (dentry->d_name.len > UDF_NAME_LEN) return ERR_PTR(-ENAMETOOLONG); -#ifdef UDF_RECOVERY - /* temporary shorthand for specifying files by inode number */ - if (!strncmp(dentry->d_name.name, ".B=", 3)) { - struct kernel_lb_addr lb = { - .logicalBlockNum = 0, - .partitionReferenceNum = - simple_strtoul(dentry->d_name.name + 3, - NULL, 0), - }; - inode = udf_iget(dir->i_sb, lb); - if (IS_ERR(inode)) - return inode; - } else -#endif /* UDF_RECOVERY */ - fi = udf_find_entry(dir, &dentry->d_name, &fibh, &cfi); if (IS_ERR(fi)) return ERR_CAST(fi); diff --git a/fs/udf/super.c b/fs/udf/super.c index f64691f2168a..a14346137361 100644 --- a/fs/udf/super.c +++ b/fs/udf/super.c @@ -566,6 +566,11 @@ static int udf_parse_options(char *options, struct udf_options *uopt, if (!remount) { if (uopt->nls_map) unload_nls(uopt->nls_map); + /* + * load_nls() failure is handled later in + * udf_fill_super() after all options are + * parsed. + */ uopt->nls_map = load_nls(args[0].from); uopt->flags |= (1 << UDF_FLAG_NLS_MAP); } diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 732745f865b7..3813fe45effd 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -57,6 +57,8 @@ extern ssize_t cpu_show_spec_store_bypass(struct device *dev, struct device_attribute *attr, char *buf); extern ssize_t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_mds(struct device *dev, + struct device_attribute *attr, char *buf); extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index f5740423b002..65559900d4d7 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -164,6 +164,10 @@ struct f2fs_checkpoint { unsigned char sit_nat_version_bitmap[1]; } __packed; +#define CP_CHKSUM_OFFSET 4092 /* default chksum offset in checkpoint */ +#define CP_MIN_CHKSUM_OFFSET \ + (offsetof(struct f2fs_checkpoint, sit_nat_version_bitmap)) + /* * For orphan inode management */ @@ -198,11 +202,12 @@ struct f2fs_extent { get_extra_isize(inode)) #define DEF_NIDS_PER_INODE 5 /* Node IDs in an Inode */ #define ADDRS_PER_INODE(inode) addrs_per_inode(inode) -#define ADDRS_PER_BLOCK 1018 /* Address Pointers in a Direct Block */ +#define DEF_ADDRS_PER_BLOCK 1018 /* Address Pointers in a Direct Block */ +#define ADDRS_PER_BLOCK(inode) addrs_per_block(inode) #define NIDS_PER_BLOCK 1018 /* Node IDs in an Indirect Block */ #define ADDRS_PER_PAGE(page, inode) \ - (IS_INODE(page) ? ADDRS_PER_INODE(inode) : ADDRS_PER_BLOCK) + (IS_INODE(page) ? ADDRS_PER_INODE(inode) : ADDRS_PER_BLOCK(inode)) #define NODE_DIR1_BLOCK (DEF_ADDRS_PER_INODE + 1) #define NODE_DIR2_BLOCK (DEF_ADDRS_PER_INODE + 2) @@ -267,7 +272,7 @@ struct f2fs_inode { } __packed; struct direct_node { - __le32 addr[ADDRS_PER_BLOCK]; /* array of data block address */ + __le32 addr[DEF_ADDRS_PER_BLOCK]; /* array of data block address */ } __packed; struct indirect_node { diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 0c0ef3078a22..94972e8eb6d1 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -152,39 +152,6 @@ static inline void fsnotify_vfsmount_delete(struct vfsmount *mnt) } /* - * fsnotify_nameremove - a filename was removed from a directory - * - * This is mostly called under parent vfs inode lock so name and - * dentry->d_parent should be stable. However there are some corner cases where - * inode lock is not held. So to be on the safe side and be reselient to future - * callers and out of tree users of d_delete(), we do not assume that d_parent - * and d_name are stable and we use dget_parent() and - * take_dentry_name_snapshot() to grab stable references. - */ -static inline void fsnotify_nameremove(struct dentry *dentry, int isdir) -{ - struct dentry *parent; - struct name_snapshot name; - __u32 mask = FS_DELETE; - - /* d_delete() of pseudo inode? (e.g. __ns_get_path() playing tricks) */ - if (IS_ROOT(dentry)) - return; - - if (isdir) - mask |= FS_ISDIR; - - parent = dget_parent(dentry); - take_dentry_name_snapshot(&name, dentry); - - fsnotify(d_inode(parent), mask, d_inode(dentry), FSNOTIFY_EVENT_INODE, - &name.name, 0); - - release_dentry_name_snapshot(&name); - dput(parent); -} - -/* * fsnotify_inoderemove - an inode is going away */ static inline void fsnotify_inoderemove(struct inode *inode) diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index c28f6ed1f59b..a9f9dcc1e515 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -355,6 +355,7 @@ extern int __fsnotify_parent(const struct path *path, struct dentry *dentry, __u extern void __fsnotify_inode_delete(struct inode *inode); extern void __fsnotify_vfsmount_delete(struct vfsmount *mnt); extern void fsnotify_sb_delete(struct super_block *sb); +extern void fsnotify_nameremove(struct dentry *dentry, int isdir); extern u32 fsnotify_get_cookie(void); static inline int fsnotify_inode_watches_children(struct inode *inode) @@ -524,6 +525,9 @@ static inline void __fsnotify_vfsmount_delete(struct vfsmount *mnt) static inline void fsnotify_sb_delete(struct super_block *sb) {} +static inline void fsnotify_nameremove(struct dentry *dentry, int isdir) +{} + static inline void fsnotify_update_flags(struct dentry *dentry) {} diff --git a/include/linux/percpu.h b/include/linux/percpu.h index 70b7123f38c7..9909dc0e273a 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -26,16 +26,10 @@ #define PCPU_MIN_ALLOC_SHIFT 2 #define PCPU_MIN_ALLOC_SIZE (1 << PCPU_MIN_ALLOC_SHIFT) -/* number of bits per page, used to trigger a scan if blocks are > PAGE_SIZE */ -#define PCPU_BITS_PER_PAGE (PAGE_SIZE >> PCPU_MIN_ALLOC_SHIFT) - /* - * This determines the size of each metadata block. There are several subtle - * constraints around this constant. The reserved region must be a multiple of - * PCPU_BITMAP_BLOCK_SIZE. Additionally, PCPU_BITMAP_BLOCK_SIZE must be a - * multiple of PAGE_SIZE or PAGE_SIZE must be a multiple of - * PCPU_BITMAP_BLOCK_SIZE to align with the populated page map. The unit_size - * also has to be a multiple of PCPU_BITMAP_BLOCK_SIZE to ensure full blocks. + * The PCPU_BITMAP_BLOCK_SIZE must be the same size as PAGE_SIZE as the + * updating of hints is used to manage the nr_empty_pop_pages in both + * the chunk and globally. */ #define PCPU_BITMAP_BLOCK_SIZE PAGE_SIZE #define PCPU_BITMAP_BLOCK_BITS (PCPU_BITMAP_BLOCK_SIZE >> \ diff --git a/include/net/dsa.h b/include/net/dsa.h index 6aaaadd6a413..685294817712 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -99,24 +99,9 @@ struct __dsa_skb_cb { #define DSA_SKB_CB(skb) ((struct dsa_skb_cb *)((skb)->cb)) -#define DSA_SKB_CB_COPY(nskb, skb) \ - { *__DSA_SKB_CB(nskb) = *__DSA_SKB_CB(skb); } - -#define DSA_SKB_CB_ZERO(skb) \ - { *__DSA_SKB_CB(skb) = (struct __dsa_skb_cb) {0}; } - #define DSA_SKB_CB_PRIV(skb) \ ((void *)(skb)->cb + offsetof(struct __dsa_skb_cb, priv)) -#define DSA_SKB_CB_CLONE(_clone, _skb) \ - { \ - struct sk_buff *clone = _clone; \ - struct sk_buff *skb = _skb; \ - \ - DSA_SKB_CB_COPY(clone, skb); \ - DSA_SKB_CB(skb)->clone = clone; \ - } - struct dsa_switch_tree { struct list_head list; diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h index a3916b4dd57e..53b96f12300c 100644 --- a/include/trace/events/f2fs.h +++ b/include/trace/events/f2fs.h @@ -533,6 +533,37 @@ TRACE_EVENT(f2fs_truncate_partial_nodes, __entry->err) ); +TRACE_EVENT(f2fs_file_write_iter, + + TP_PROTO(struct inode *inode, unsigned long offset, + unsigned long length, int ret), + + TP_ARGS(inode, offset, length, ret), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(ino_t, ino) + __field(unsigned long, offset) + __field(unsigned long, length) + __field(int, ret) + ), + + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->offset = offset; + __entry->length = length; + __entry->ret = ret; + ), + + TP_printk("dev = (%d,%d), ino = %lu, " + "offset = %lu, length = %lu, written(err) = %d", + show_dev_ino(__entry), + __entry->offset, + __entry->length, + __entry->ret) +); + TRACE_EVENT(f2fs_map_blocks, TP_PROTO(struct inode *inode, struct f2fs_map_blocks *map, int ret), @@ -1253,6 +1284,32 @@ DEFINE_EVENT(f2fs__page, f2fs_commit_inmem_page, TP_ARGS(page, type) ); +TRACE_EVENT(f2fs_filemap_fault, + + TP_PROTO(struct inode *inode, pgoff_t index, unsigned long ret), + + TP_ARGS(inode, index, ret), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(ino_t, ino) + __field(pgoff_t, index) + __field(unsigned long, ret) + ), + + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->index = index; + __entry->ret = ret; + ), + + TP_printk("dev = (%d,%d), ino = %lu, index = %lu, ret = %lx", + show_dev_ino(__entry), + (unsigned long)__entry->index, + __entry->ret) +); + TRACE_EVENT(f2fs_writepages, TP_PROTO(struct inode *inode, struct writeback_control *wbc, int type), diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 72336bac7573..63e0cf66f01a 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -629,7 +629,7 @@ union bpf_attr { * **BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\ * **->swhash** and *skb*\ **->l4hash** to 0). * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -654,7 +654,7 @@ union bpf_attr { * flexibility and can handle sizes larger than 2 or 4 for the * checksum to update. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -686,7 +686,7 @@ union bpf_attr { * flexibility and can handle sizes larger than 2 or 4 for the * checksum to update. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -741,7 +741,7 @@ union bpf_attr { * efficient, but it is handled through an action code where the * redirection happens only after the eBPF program has returned. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -806,7 +806,7 @@ union bpf_attr { * **ETH_P_8021Q** and **ETH_P_8021AD**, it is considered to * be **ETH_P_8021Q**. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -818,7 +818,7 @@ union bpf_attr { * Description * Pop a VLAN header from the packet associated to *skb*. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1168,7 +1168,7 @@ union bpf_attr { * All values for *flags* are reserved for future usage, and must * be left at zero. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1281,7 +1281,7 @@ union bpf_attr { * implicitly linearizes, unclones and drops offloads from the * *skb*. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1317,7 +1317,7 @@ union bpf_attr { * **bpf_skb_pull_data()** to effectively unclone the *skb* from * the very beginning in case it is indeed cloned. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1369,7 +1369,7 @@ union bpf_attr { * All values for *flags* are reserved for future usage, and must * be left at zero. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1384,7 +1384,7 @@ union bpf_attr { * can be used to prepare the packet for pushing or popping * headers. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1518,20 +1518,20 @@ union bpf_attr { * * **BPF_F_ADJ_ROOM_FIXED_GSO**: Do not adjust gso_size. * Adjusting mss in this way is not allowed for datagrams. * - * * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 **: - * * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 **: + * * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV4**, + * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV6**: * Any new space is reserved to hold a tunnel header. * Configure skb offsets and other fields accordingly. * - * * **BPF_F_ADJ_ROOM_ENCAP_L4_GRE **: - * * **BPF_F_ADJ_ROOM_ENCAP_L4_UDP **: + * * **BPF_F_ADJ_ROOM_ENCAP_L4_GRE**, + * **BPF_F_ADJ_ROOM_ENCAP_L4_UDP**: * Use with ENCAP_L3 flags to further specify the tunnel type. * - * * **BPF_F_ADJ_ROOM_ENCAP_L2(len) **: + * * **BPF_F_ADJ_ROOM_ENCAP_L2**\ (*len*): * Use with ENCAP_L3/L4 flags to further specify the tunnel - * type; **len** is the length of the inner MAC header. + * type; *len* is the length of the inner MAC header. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1610,7 +1610,7 @@ union bpf_attr { * more flexibility as the user is free to store whatever meta * data they need. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1852,7 +1852,7 @@ union bpf_attr { * copied if necessary (i.e. if data was not linear and if start * and end pointers do not point to the same chunk). * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1886,7 +1886,7 @@ union bpf_attr { * only possible to shrink the packet as of this writing, * therefore *delta* must be a negative integer. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -2061,18 +2061,18 @@ union bpf_attr { * **BPF_LWT_ENCAP_IP** * IP encapsulation (GRE/GUE/IPIP/etc). The outer header * must be IPv4 or IPv6, followed by zero or more - * additional headers, up to LWT_BPF_MAX_HEADROOM total - * bytes in all prepended headers. Please note that - * if skb_is_gso(skb) is true, no more than two headers - * can be prepended, and the inner header, if present, - * should be either GRE or UDP/GUE. - * - * BPF_LWT_ENCAP_SEG6*** types can be called by bpf programs of - * type BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP type can be called - * by bpf programs of types BPF_PROG_TYPE_LWT_IN and - * BPF_PROG_TYPE_LWT_XMIT. - * - * A call to this helper is susceptible to change the underlaying + * additional headers, up to **LWT_BPF_MAX_HEADROOM** + * total bytes in all prepended headers. Please note that + * if **skb_is_gso**\ (*skb*) is true, no more than two + * headers can be prepended, and the inner header, if + * present, should be either GRE or UDP/GUE. + * + * **BPF_LWT_ENCAP_SEG6**\ \* types can be called by BPF programs + * of type **BPF_PROG_TYPE_LWT_IN**; **BPF_LWT_ENCAP_IP** type can + * be called by bpf programs of types **BPF_PROG_TYPE_LWT_IN** and + * **BPF_PROG_TYPE_LWT_XMIT**. + * + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -2087,7 +2087,7 @@ union bpf_attr { * inside the outermost IPv6 Segment Routing Header can be * modified through this helper. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -2103,7 +2103,7 @@ union bpf_attr { * after the segments are accepted. *delta* can be as well * positive (growing) as negative (shrinking). * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -2126,13 +2126,13 @@ union bpf_attr { * Type of *param*: **int**. * **SEG6_LOCAL_ACTION_END_B6** * End.B6 action: Endpoint bound to an SRv6 policy. - * Type of param: **struct ipv6_sr_hdr**. + * Type of *param*: **struct ipv6_sr_hdr**. * **SEG6_LOCAL_ACTION_END_B6_ENCAP** * End.B6.Encap action: Endpoint bound to an SRv6 * encapsulation policy. - * Type of param: **struct ipv6_sr_hdr**. + * Type of *param*: **struct ipv6_sr_hdr**. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -2285,7 +2285,8 @@ union bpf_attr { * Return * Pointer to **struct bpf_sock**, or **NULL** in case of failure. * For sockets with reuseport option, the **struct bpf_sock** - * result is from **reuse->socks**\ [] using the hash of the tuple. + * result is from *reuse*\ **->socks**\ [] using the hash of the + * tuple. * * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u64 netns, u64 flags) * Description @@ -2321,7 +2322,8 @@ union bpf_attr { * Return * Pointer to **struct bpf_sock**, or **NULL** in case of failure. * For sockets with reuseport option, the **struct bpf_sock** - * result is from **reuse->socks**\ [] using the hash of the tuple. + * result is from *reuse*\ **->socks**\ [] using the hash of the + * tuple. * * int bpf_sk_release(struct bpf_sock *sock) * Description @@ -2490,31 +2492,34 @@ union bpf_attr { * network namespace *netns*. The return value must be checked, * and if non-**NULL**, released via **bpf_sk_release**\ (). * - * This function is identical to bpf_sk_lookup_tcp, except that it - * also returns timewait or request sockets. Use bpf_sk_fullsock - * or bpf_tcp_socket to access the full structure. + * This function is identical to **bpf_sk_lookup_tcp**\ (), except + * that it also returns timewait or request sockets. Use + * **bpf_sk_fullsock**\ () or **bpf_tcp_sock**\ () to access the + * full structure. * * This helper is available only if the kernel was compiled with * **CONFIG_NET** configuration option. * Return * Pointer to **struct bpf_sock**, or **NULL** in case of failure. * For sockets with reuseport option, the **struct bpf_sock** - * result is from **reuse->socks**\ [] using the hash of the tuple. + * result is from *reuse*\ **->socks**\ [] using the hash of the + * tuple. * * int bpf_tcp_check_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len) * Description - * Check whether iph and th contain a valid SYN cookie ACK for - * the listening socket in sk. + * Check whether *iph* and *th* contain a valid SYN cookie ACK for + * the listening socket in *sk*. * - * iph points to the start of the IPv4 or IPv6 header, while - * iph_len contains sizeof(struct iphdr) or sizeof(struct ip6hdr). + * *iph* points to the start of the IPv4 or IPv6 header, while + * *iph_len* contains **sizeof**\ (**struct iphdr**) or + * **sizeof**\ (**struct ip6hdr**). * - * th points to the start of the TCP header, while th_len contains - * sizeof(struct tcphdr). + * *th* points to the start of the TCP header, while *th_len* + * contains **sizeof**\ (**struct tcphdr**). * * Return - * 0 if iph and th are a valid SYN cookie ACK, or a negative error - * otherwise. + * 0 if *iph* and *th* are a valid SYN cookie ACK, or a negative + * error otherwise. * * int bpf_sysctl_get_name(struct bpf_sysctl *ctx, char *buf, size_t buf_len, u64 flags) * Description @@ -2592,17 +2597,17 @@ union bpf_attr { * and save the result in *res*. * * The string may begin with an arbitrary amount of white space - * (as determined by isspace(3)) followed by a single optional '-' - * sign. + * (as determined by **isspace**\ (3)) followed by a single + * optional '**-**' sign. * * Five least significant bits of *flags* encode base, other bits * are currently unused. * * Base must be either 8, 10, 16 or 0 to detect it automatically - * similar to user space strtol(3). + * similar to user space **strtol**\ (3). * Return * Number of characters consumed on success. Must be positive but - * no more than buf_len. + * no more than *buf_len*. * * **-EINVAL** if no valid digits were found or unsupported base * was provided. @@ -2616,16 +2621,16 @@ union bpf_attr { * given base and save the result in *res*. * * The string may begin with an arbitrary amount of white space - * (as determined by isspace(3)). + * (as determined by **isspace**\ (3)). * * Five least significant bits of *flags* encode base, other bits * are currently unused. * * Base must be either 8, 10, 16 or 0 to detect it automatically - * similar to user space strtoul(3). + * similar to user space **strtoul**\ (3). * Return * Number of characters consumed on success. Must be positive but - * no more than buf_len. + * no more than *buf_len*. * * **-EINVAL** if no valid digits were found or unsupported base * was provided. @@ -2634,26 +2639,26 @@ union bpf_attr { * * void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags) * Description - * Get a bpf-local-storage from a sk. + * Get a bpf-local-storage from a *sk*. * * Logically, it could be thought of getting the value from * a *map* with *sk* as the **key**. From this * perspective, the usage is not much different from - * **bpf_map_lookup_elem(map, &sk)** except this - * helper enforces the key must be a **bpf_fullsock()** - * and the map must be a BPF_MAP_TYPE_SK_STORAGE also. + * **bpf_map_lookup_elem**\ (*map*, **&**\ *sk*) except this + * helper enforces the key must be a full socket and the map must + * be a **BPF_MAP_TYPE_SK_STORAGE** also. * * Underneath, the value is stored locally at *sk* instead of - * the map. The *map* is used as the bpf-local-storage **type**. - * The bpf-local-storage **type** (i.e. the *map*) is searched - * against all bpf-local-storages residing at sk. + * the *map*. The *map* is used as the bpf-local-storage + * "type". The bpf-local-storage "type" (i.e. the *map*) is + * searched against all bpf-local-storages residing at *sk*. * - * An optional *flags* (BPF_SK_STORAGE_GET_F_CREATE) can be + * An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be * used such that a new bpf-local-storage will be * created if one does not exist. *value* can be used - * together with BPF_SK_STORAGE_GET_F_CREATE to specify + * together with **BPF_SK_STORAGE_GET_F_CREATE** to specify * the initial value of a bpf-local-storage. If *value* is - * NULL, the new bpf-local-storage will be zero initialized. + * **NULL**, the new bpf-local-storage will be zero initialized. * Return * A bpf-local-storage pointer is returned on success. * @@ -2662,7 +2667,7 @@ union bpf_attr { * * int bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk) * Description - * Delete a bpf-local-storage from a sk. + * Delete a bpf-local-storage from a *sk*. * Return * 0 on success. * diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 2ac598614a8f..19fb55e3c73e 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -44,6 +44,7 @@ * - add lock_owner field to fuse_setattr_in, fuse_read_in and fuse_write_in * - add blksize field to fuse_attr * - add file flags field to fuse_read_in and fuse_write_in + * - Add ATIME_NOW and MTIME_NOW flags to fuse_setattr_in * * 7.10 * - add nonseekable open flag @@ -54,7 +55,7 @@ * - add POLL message and NOTIFY_POLL notification * * 7.12 - * - add umask flag to input argument of open, mknod and mkdir + * - add umask flag to input argument of create, mknod and mkdir * - add notification messages for invalidation of inodes and * directory entries * @@ -125,6 +126,10 @@ * * 7.29 * - add FUSE_NO_OPENDIR_SUPPORT flag + * + * 7.30 + * - add FUSE_EXPLICIT_INVAL_DATA + * - add FUSE_IOCTL_COMPAT_X32 */ #ifndef _LINUX_FUSE_H @@ -160,7 +165,7 @@ #define FUSE_KERNEL_VERSION 7 /** Minor version number of this interface */ -#define FUSE_KERNEL_MINOR_VERSION 29 +#define FUSE_KERNEL_MINOR_VERSION 30 /** The node ID of the root inode */ #define FUSE_ROOT_ID 1 @@ -229,11 +234,13 @@ struct fuse_file_lock { * FOPEN_KEEP_CACHE: don't invalidate the data cache on open * FOPEN_NONSEEKABLE: the file is not seekable * FOPEN_CACHE_DIR: allow caching this directory + * FOPEN_STREAM: the file is stream-like (no file position at all) */ #define FOPEN_DIRECT_IO (1 << 0) #define FOPEN_KEEP_CACHE (1 << 1) #define FOPEN_NONSEEKABLE (1 << 2) #define FOPEN_CACHE_DIR (1 << 3) +#define FOPEN_STREAM (1 << 4) /** * INIT request/reply flags @@ -263,6 +270,7 @@ struct fuse_file_lock { * FUSE_MAX_PAGES: init_out.max_pages contains the max number of req pages * FUSE_CACHE_SYMLINKS: cache READLINK responses * FUSE_NO_OPENDIR_SUPPORT: kernel supports zero-message opendir + * FUSE_EXPLICIT_INVAL_DATA: only invalidate cached pages on explicit request */ #define FUSE_ASYNC_READ (1 << 0) #define FUSE_POSIX_LOCKS (1 << 1) @@ -289,6 +297,7 @@ struct fuse_file_lock { #define FUSE_MAX_PAGES (1 << 22) #define FUSE_CACHE_SYMLINKS (1 << 23) #define FUSE_NO_OPENDIR_SUPPORT (1 << 24) +#define FUSE_EXPLICIT_INVAL_DATA (1 << 25) /** * CUSE INIT request/reply flags @@ -335,6 +344,7 @@ struct fuse_file_lock { * FUSE_IOCTL_RETRY: retry with new iovecs * FUSE_IOCTL_32BIT: 32bit ioctl * FUSE_IOCTL_DIR: is a directory + * FUSE_IOCTL_COMPAT_X32: x32 compat ioctl on 64bit machine (64bit time_t) * * FUSE_IOCTL_MAX_IOV: maximum of in_iovecs + out_iovecs */ @@ -343,6 +353,7 @@ struct fuse_file_lock { #define FUSE_IOCTL_RETRY (1 << 2) #define FUSE_IOCTL_32BIT (1 << 3) #define FUSE_IOCTL_DIR (1 << 4) +#define FUSE_IOCTL_COMPAT_X32 (1 << 5) #define FUSE_IOCTL_MAX_IOV 256 @@ -353,6 +364,13 @@ struct fuse_file_lock { */ #define FUSE_POLL_SCHEDULE_NOTIFY (1 << 0) +/** + * Fsync flags + * + * FUSE_FSYNC_FDATASYNC: Sync data only, not metadata + */ +#define FUSE_FSYNC_FDATASYNC (1 << 0) + enum fuse_opcode { FUSE_LOOKUP = 1, FUSE_FORGET = 2, /* no reply */ diff --git a/include/uapi/linux/input-event-codes.h b/include/uapi/linux/input-event-codes.h index 64cee116928e..85387c76c24f 100644 --- a/include/uapi/linux/input-event-codes.h +++ b/include/uapi/linux/input-event-codes.h @@ -606,6 +606,7 @@ #define KEY_SCREENSAVER 0x245 /* AL Screen Saver */ #define KEY_VOICECOMMAND 0x246 /* Listening Voice Command */ #define KEY_ASSISTANT 0x247 /* AL Context-aware desktop assistant */ +#define KEY_KBD_LAYOUT_NEXT 0x248 /* AC Next Keyboard Layout Select */ #define KEY_BRIGHTNESS_MIN 0x250 /* Set Brightness to Minimum */ #define KEY_BRIGHTNESS_MAX 0x251 /* Set Brightness to Maximum */ diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index f0cf7b0f4f35..505393c6e959 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -966,7 +966,6 @@ enum nft_socket_keys { * @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address) * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address) * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address) - * @NFT_CT_TIMEOUT: connection tracking timeout policy assigned to conntrack * @NFT_CT_ID: conntrack id */ enum nft_ct_keys { @@ -993,7 +992,6 @@ enum nft_ct_keys { NFT_CT_DST_IP, NFT_CT_SRC_IP6, NFT_CT_DST_IP6, - NFT_CT_TIMEOUT, NFT_CT_ID, __NFT_CT_MAX }; @@ -1138,7 +1136,7 @@ enum nft_log_level { NFT_LOGLEVEL_AUDIT, __NFT_LOGLEVEL_MAX }; -#define NFT_LOGLEVEL_MAX (__NFT_LOGLEVEL_MAX + 1) +#define NFT_LOGLEVEL_MAX (__NFT_LOGLEVEL_MAX - 1) /** * enum nft_queue_attributes - nf_tables queue expression netlink attributes diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 3ba56e73c90e..242a643af82f 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -338,7 +338,7 @@ int bpf_prog_calc_tag(struct bpf_prog *fp) } static int bpf_adj_delta_to_imm(struct bpf_insn *insn, u32 pos, s32 end_old, - s32 end_new, u32 curr, const bool probe_pass) + s32 end_new, s32 curr, const bool probe_pass) { const s64 imm_min = S32_MIN, imm_max = S32_MAX; s32 delta = end_new - end_old; @@ -356,7 +356,7 @@ static int bpf_adj_delta_to_imm(struct bpf_insn *insn, u32 pos, s32 end_old, } static int bpf_adj_delta_to_off(struct bpf_insn *insn, u32 pos, s32 end_old, - s32 end_new, u32 curr, const bool probe_pass) + s32 end_new, s32 curr, const bool probe_pass) { const s32 off_min = S16_MIN, off_max = S16_MAX; s32 delta = end_new - end_old; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 7b05e8938d5c..95f9354495ad 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7599,7 +7599,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) insn->dst_reg, shift); insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg, - (1 << size * 8) - 1); + (1ULL << size * 8) - 1); } } diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h index b1739dc06b73..0468ba500bd4 100644 --- a/mm/percpu-internal.h +++ b/mm/percpu-internal.h @@ -9,8 +9,17 @@ * pcpu_block_md is the metadata block struct. * Each chunk's bitmap is split into a number of full blocks. * All units are in terms of bits. + * + * The scan hint is the largest known contiguous area before the contig hint. + * It is not necessarily the actual largest contig hint though. There is an + * invariant that the scan_hint_start > contig_hint_start iff + * scan_hint == contig_hint. This is necessary because when scanning forward, + * we don't know if a new contig hint would be better than the current one. */ struct pcpu_block_md { + int scan_hint; /* scan hint for block */ + int scan_hint_start; /* block relative starting + position of the scan hint */ int contig_hint; /* contig hint for block */ int contig_hint_start; /* block relative starting position of the contig hint */ @@ -19,6 +28,7 @@ struct pcpu_block_md { int right_free; /* size of free space along the right side of the block */ int first_free; /* block position of first free */ + int nr_bits; /* total bits responsible for */ }; struct pcpu_chunk { @@ -29,9 +39,7 @@ struct pcpu_chunk { struct list_head list; /* linked to pcpu_slot lists */ int free_bytes; /* free bytes in the chunk */ - int contig_bits; /* max contiguous size hint */ - int contig_bits_start; /* contig_bits starting - offset */ + struct pcpu_block_md chunk_md; void *base_addr; /* base address of this chunk */ unsigned long *alloc_map; /* allocation map */ @@ -39,7 +47,6 @@ struct pcpu_chunk { struct pcpu_block_md *md_blocks; /* metadata blocks */ void *data; /* chunk data */ - int first_bit; /* no free below this */ bool immutable; /* no [de]population allowed */ int start_offset; /* the overlap with the previous region to have a page aligned diff --git a/mm/percpu-km.c b/mm/percpu-km.c index b68d5df14731..3a2ff5c9192c 100644 --- a/mm/percpu-km.c +++ b/mm/percpu-km.c @@ -70,7 +70,7 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) chunk->base_addr = page_address(pages); spin_lock_irqsave(&pcpu_lock, flags); - pcpu_chunk_populated(chunk, 0, nr_pages, false); + pcpu_chunk_populated(chunk, 0, nr_pages); spin_unlock_irqrestore(&pcpu_lock, flags); pcpu_stats_chunk_alloc(); diff --git a/mm/percpu-stats.c b/mm/percpu-stats.c index b5fdd43b60c9..ef5034a0464e 100644 --- a/mm/percpu-stats.c +++ b/mm/percpu-stats.c @@ -53,6 +53,7 @@ static int find_max_nr_alloc(void) static void chunk_map_stats(struct seq_file *m, struct pcpu_chunk *chunk, int *buffer) { + struct pcpu_block_md *chunk_md = &chunk->chunk_md; int i, last_alloc, as_len, start, end; int *alloc_sizes, *p; /* statistics */ @@ -121,9 +122,9 @@ static void chunk_map_stats(struct seq_file *m, struct pcpu_chunk *chunk, P("nr_alloc", chunk->nr_alloc); P("max_alloc_size", chunk->max_alloc_size); P("empty_pop_pages", chunk->nr_empty_pop_pages); - P("first_bit", chunk->first_bit); + P("first_bit", chunk_md->first_free); P("free_bytes", chunk->free_bytes); - P("contig_bytes", chunk->contig_bits * PCPU_MIN_ALLOC_SIZE); + P("contig_bytes", chunk_md->contig_hint * PCPU_MIN_ALLOC_SIZE); P("sum_frag", sum_frag); P("max_frag", max_frag); P("cur_min_alloc", cur_min_alloc); diff --git a/mm/percpu.c b/mm/percpu.c index 68dd2e7e73b5..2df0ee680ea6 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -94,6 +94,8 @@ /* the slots are sorted by free bytes left, 1-31 bytes share the same slot */ #define PCPU_SLOT_BASE_SHIFT 5 +/* chunks in slots below this are subject to being sidelined on failed alloc */ +#define PCPU_SLOT_FAIL_THRESHOLD 3 #define PCPU_EMPTY_POP_PAGES_LOW 2 #define PCPU_EMPTY_POP_PAGES_HIGH 4 @@ -231,10 +233,13 @@ static int pcpu_size_to_slot(int size) static int pcpu_chunk_slot(const struct pcpu_chunk *chunk) { - if (chunk->free_bytes < PCPU_MIN_ALLOC_SIZE || chunk->contig_bits == 0) + const struct pcpu_block_md *chunk_md = &chunk->chunk_md; + + if (chunk->free_bytes < PCPU_MIN_ALLOC_SIZE || + chunk_md->contig_hint == 0) return 0; - return pcpu_size_to_slot(chunk->free_bytes); + return pcpu_size_to_slot(chunk_md->contig_hint * PCPU_MIN_ALLOC_SIZE); } /* set the pointer to a chunk in a page struct */ @@ -318,6 +323,34 @@ static unsigned long pcpu_block_off_to_off(int index, int off) return index * PCPU_BITMAP_BLOCK_BITS + off; } +/* + * pcpu_next_hint - determine which hint to use + * @block: block of interest + * @alloc_bits: size of allocation + * + * This determines if we should scan based on the scan_hint or first_free. + * In general, we want to scan from first_free to fulfill allocations by + * first fit. However, if we know a scan_hint at position scan_hint_start + * cannot fulfill an allocation, we can begin scanning from there knowing + * the contig_hint will be our fallback. + */ +static int pcpu_next_hint(struct pcpu_block_md *block, int alloc_bits) +{ + /* + * The three conditions below determine if we can skip past the + * scan_hint. First, does the scan hint exist. Second, is the + * contig_hint after the scan_hint (possibly not true iff + * contig_hint == scan_hint). Third, is the allocation request + * larger than the scan_hint. + */ + if (block->scan_hint && + block->contig_hint_start > block->scan_hint_start && + alloc_bits > block->scan_hint) + return block->scan_hint_start + block->scan_hint; + + return block->first_free; +} + /** * pcpu_next_md_free_region - finds the next hint free area * @chunk: chunk of interest @@ -413,9 +446,11 @@ static void pcpu_next_fit_region(struct pcpu_chunk *chunk, int alloc_bits, if (block->contig_hint && block->contig_hint_start >= block_off && block->contig_hint >= *bits + alloc_bits) { + int start = pcpu_next_hint(block, alloc_bits); + *bits += alloc_bits + block->contig_hint_start - - block->first_free; - *bit_off = pcpu_block_off_to_off(i, block->first_free); + start; + *bit_off = pcpu_block_off_to_off(i, start); return; } /* reset to satisfy the second predicate above */ @@ -488,6 +523,22 @@ static void pcpu_mem_free(void *ptr) kvfree(ptr); } +static void __pcpu_chunk_move(struct pcpu_chunk *chunk, int slot, + bool move_front) +{ + if (chunk != pcpu_reserved_chunk) { + if (move_front) + list_move(&chunk->list, &pcpu_slot[slot]); + else + list_move_tail(&chunk->list, &pcpu_slot[slot]); + } +} + +static void pcpu_chunk_move(struct pcpu_chunk *chunk, int slot) +{ + __pcpu_chunk_move(chunk, slot, true); +} + /** * pcpu_chunk_relocate - put chunk in the appropriate chunk slot * @chunk: chunk of interest @@ -505,110 +556,39 @@ static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot) { int nslot = pcpu_chunk_slot(chunk); - if (chunk != pcpu_reserved_chunk && oslot != nslot) { - if (oslot < nslot) - list_move(&chunk->list, &pcpu_slot[nslot]); - else - list_move_tail(&chunk->list, &pcpu_slot[nslot]); - } + if (oslot != nslot) + __pcpu_chunk_move(chunk, nslot, oslot < nslot); } -/** - * pcpu_cnt_pop_pages- counts populated backing pages in range +/* + * pcpu_update_empty_pages - update empty page counters * @chunk: chunk of interest - * @bit_off: start offset - * @bits: size of area to check - * - * Calculates the number of populated pages in the region - * [page_start, page_end). This keeps track of how many empty populated - * pages are available and decide if async work should be scheduled. + * @nr: nr of empty pages * - * RETURNS: - * The nr of populated pages. + * This is used to keep track of the empty pages now based on the premise + * a md_block covers a page. The hint update functions recognize if a block + * is made full or broken to calculate deltas for keeping track of free pages. */ -static inline int pcpu_cnt_pop_pages(struct pcpu_chunk *chunk, int bit_off, - int bits) +static inline void pcpu_update_empty_pages(struct pcpu_chunk *chunk, int nr) { - int page_start = PFN_UP(bit_off * PCPU_MIN_ALLOC_SIZE); - int page_end = PFN_DOWN((bit_off + bits) * PCPU_MIN_ALLOC_SIZE); - - if (page_start >= page_end) - return 0; - - /* - * bitmap_weight counts the number of bits set in a bitmap up to - * the specified number of bits. This is counting the populated - * pages up to page_end and then subtracting the populated pages - * up to page_start to count the populated pages in - * [page_start, page_end). - */ - return bitmap_weight(chunk->populated, page_end) - - bitmap_weight(chunk->populated, page_start); -} - -/** - * pcpu_chunk_update - updates the chunk metadata given a free area - * @chunk: chunk of interest - * @bit_off: chunk offset - * @bits: size of free area - * - * This updates the chunk's contig hint and starting offset given a free area. - * Choose the best starting offset if the contig hint is equal. - */ -static void pcpu_chunk_update(struct pcpu_chunk *chunk, int bit_off, int bits) -{ - if (bits > chunk->contig_bits) { - chunk->contig_bits_start = bit_off; - chunk->contig_bits = bits; - } else if (bits == chunk->contig_bits && chunk->contig_bits_start && - (!bit_off || - __ffs(bit_off) > __ffs(chunk->contig_bits_start))) { - /* use the start with the best alignment */ - chunk->contig_bits_start = bit_off; - } + chunk->nr_empty_pop_pages += nr; + if (chunk != pcpu_reserved_chunk) + pcpu_nr_empty_pop_pages += nr; } -/** - * pcpu_chunk_refresh_hint - updates metadata about a chunk - * @chunk: chunk of interest - * - * Iterates over the metadata blocks to find the largest contig area. - * It also counts the populated pages and uses the delta to update the - * global count. - * - * Updates: - * chunk->contig_bits - * chunk->contig_bits_start - * nr_empty_pop_pages (chunk and global) +/* + * pcpu_region_overlap - determines if two regions overlap + * @a: start of first region, inclusive + * @b: end of first region, exclusive + * @x: start of second region, inclusive + * @y: end of second region, exclusive + * + * This is used to determine if the hint region [a, b) overlaps with the + * allocated region [x, y). */ -static void pcpu_chunk_refresh_hint(struct pcpu_chunk *chunk) +static inline bool pcpu_region_overlap(int a, int b, int x, int y) { - int bit_off, bits, nr_empty_pop_pages; - - /* clear metadata */ - chunk->contig_bits = 0; - - bit_off = chunk->first_bit; - bits = nr_empty_pop_pages = 0; - pcpu_for_each_md_free_region(chunk, bit_off, bits) { - pcpu_chunk_update(chunk, bit_off, bits); - - nr_empty_pop_pages += pcpu_cnt_pop_pages(chunk, bit_off, bits); - } - - /* - * Keep track of nr_empty_pop_pages. - * - * The chunk maintains the previous number of free pages it held, - * so the delta is used to update the global counter. The reserved - * chunk is not part of the free page count as they are populated - * at init and are special to serving reserved allocations. - */ - if (chunk != pcpu_reserved_chunk) - pcpu_nr_empty_pop_pages += - (nr_empty_pop_pages - chunk->nr_empty_pop_pages); - - chunk->nr_empty_pop_pages = nr_empty_pop_pages; + return (a < y) && (x < b); } /** @@ -629,16 +609,132 @@ static void pcpu_block_update(struct pcpu_block_md *block, int start, int end) if (start == 0) block->left_free = contig; - if (end == PCPU_BITMAP_BLOCK_BITS) + if (end == block->nr_bits) block->right_free = contig; if (contig > block->contig_hint) { + /* promote the old contig_hint to be the new scan_hint */ + if (start > block->contig_hint_start) { + if (block->contig_hint > block->scan_hint) { + block->scan_hint_start = + block->contig_hint_start; + block->scan_hint = block->contig_hint; + } else if (start < block->scan_hint_start) { + /* + * The old contig_hint == scan_hint. But, the + * new contig is larger so hold the invariant + * scan_hint_start < contig_hint_start. + */ + block->scan_hint = 0; + } + } else { + block->scan_hint = 0; + } block->contig_hint_start = start; block->contig_hint = contig; - } else if (block->contig_hint_start && contig == block->contig_hint && - (!start || __ffs(start) > __ffs(block->contig_hint_start))) { - /* use the start with the best alignment */ - block->contig_hint_start = start; + } else if (contig == block->contig_hint) { + if (block->contig_hint_start && + (!start || + __ffs(start) > __ffs(block->contig_hint_start))) { + /* start has a better alignment so use it */ + block->contig_hint_start = start; + if (start < block->scan_hint_start && + block->contig_hint > block->scan_hint) + block->scan_hint = 0; + } else if (start > block->scan_hint_start || + block->contig_hint > block->scan_hint) { + /* + * Knowing contig == contig_hint, update the scan_hint + * if it is farther than or larger than the current + * scan_hint. + */ + block->scan_hint_start = start; + block->scan_hint = contig; + } + } else { + /* + * The region is smaller than the contig_hint. So only update + * the scan_hint if it is larger than or equal and farther than + * the current scan_hint. + */ + if ((start < block->contig_hint_start && + (contig > block->scan_hint || + (contig == block->scan_hint && + start > block->scan_hint_start)))) { + block->scan_hint_start = start; + block->scan_hint = contig; + } + } +} + +/* + * pcpu_block_update_scan - update a block given a free area from a scan + * @chunk: chunk of interest + * @bit_off: chunk offset + * @bits: size of free area + * + * Finding the final allocation spot first goes through pcpu_find_block_fit() + * to find a block that can hold the allocation and then pcpu_alloc_area() + * where a scan is used. When allocations require specific alignments, + * we can inadvertently create holes which will not be seen in the alloc + * or free paths. + * + * This takes a given free area hole and updates a block as it may change the + * scan_hint. We need to scan backwards to ensure we don't miss free bits + * from alignment. + */ +static void pcpu_block_update_scan(struct pcpu_chunk *chunk, int bit_off, + int bits) +{ + int s_off = pcpu_off_to_block_off(bit_off); + int e_off = s_off + bits; + int s_index, l_bit; + struct pcpu_block_md *block; + + if (e_off > PCPU_BITMAP_BLOCK_BITS) + return; + + s_index = pcpu_off_to_block_index(bit_off); + block = chunk->md_blocks + s_index; + + /* scan backwards in case of alignment skipping free bits */ + l_bit = find_last_bit(pcpu_index_alloc_map(chunk, s_index), s_off); + s_off = (s_off == l_bit) ? 0 : l_bit + 1; + + pcpu_block_update(block, s_off, e_off); +} + +/** + * pcpu_chunk_refresh_hint - updates metadata about a chunk + * @chunk: chunk of interest + * @full_scan: if we should scan from the beginning + * + * Iterates over the metadata blocks to find the largest contig area. + * A full scan can be avoided on the allocation path as this is triggered + * if we broke the contig_hint. In doing so, the scan_hint will be before + * the contig_hint or after if the scan_hint == contig_hint. This cannot + * be prevented on freeing as we want to find the largest area possibly + * spanning blocks. + */ +static void pcpu_chunk_refresh_hint(struct pcpu_chunk *chunk, bool full_scan) +{ + struct pcpu_block_md *chunk_md = &chunk->chunk_md; + int bit_off, bits; + + /* promote scan_hint to contig_hint */ + if (!full_scan && chunk_md->scan_hint) { + bit_off = chunk_md->scan_hint_start + chunk_md->scan_hint; + chunk_md->contig_hint_start = chunk_md->scan_hint_start; + chunk_md->contig_hint = chunk_md->scan_hint; + chunk_md->scan_hint = 0; + } else { + bit_off = chunk_md->first_free; + chunk_md->contig_hint = 0; + } + + bits = 0; + pcpu_for_each_md_free_region(chunk, bit_off, bits) { + pcpu_block_update(chunk_md, bit_off, bit_off + bits); } } @@ -654,14 +750,23 @@ static void pcpu_block_refresh_hint(struct pcpu_chunk *chunk, int index) { struct pcpu_block_md *block = chunk->md_blocks + index; unsigned long *alloc_map = pcpu_index_alloc_map(chunk, index); - int rs, re; /* region start, region end */ + int rs, re, start; /* region start, region end */ + + /* promote scan_hint to contig_hint */ + if (block->scan_hint) { + start = block->scan_hint_start + block->scan_hint; + block->contig_hint_start = block->scan_hint_start; + block->contig_hint = block->scan_hint; + block->scan_hint = 0; + } else { + start = block->first_free; + block->contig_hint = 0; + } - /* clear hints */ - block->contig_hint = 0; - block->left_free = block->right_free = 0; + block->right_free = 0; /* iterate over free areas and update the contig hints */ - pcpu_for_each_unpop_region(alloc_map, rs, re, block->first_free, + pcpu_for_each_unpop_region(alloc_map, rs, re, start, PCPU_BITMAP_BLOCK_BITS) { pcpu_block_update(block, rs, re); } @@ -680,6 +785,8 @@ static void pcpu_block_refresh_hint(struct pcpu_chunk *chunk, int index) static void pcpu_block_update_hint_alloc(struct pcpu_chunk *chunk, int bit_off, int bits) { + struct pcpu_block_md *chunk_md = &chunk->chunk_md; + int nr_empty_pages = 0; struct pcpu_block_md *s_block, *e_block, *block; int s_index, e_index; /* block indexes of the freed allocation */ int s_off, e_off; /* block offsets of the freed allocation */ @@ -704,15 +811,29 @@ static void pcpu_block_update_hint_alloc(struct pcpu_chunk *chunk, int bit_off, * If the allocation breaks the contig_hint, a scan is required to * restore this hint. */ + if (s_block->contig_hint == PCPU_BITMAP_BLOCK_BITS) + nr_empty_pages++; + if (s_off == s_block->first_free) s_block->first_free = find_next_zero_bit( pcpu_index_alloc_map(chunk, s_index), PCPU_BITMAP_BLOCK_BITS, s_off + bits); - if (s_off >= s_block->contig_hint_start && - s_off < s_block->contig_hint_start + s_block->contig_hint) { + if (pcpu_region_overlap(s_block->scan_hint_start, + s_block->scan_hint_start + s_block->scan_hint, + s_off, + s_off + bits)) + s_block->scan_hint = 0; + + if (pcpu_region_overlap(s_block->contig_hint_start, + s_block->contig_hint_start + + s_block->contig_hint, + s_off, + s_off + bits)) { /* block contig hint is broken - scan to fix it */ + if (!s_off) + s_block->left_free = 0; pcpu_block_refresh_hint(chunk, s_index); } else { /* update left and right contig manually */ @@ -728,6 +849,9 @@ static void pcpu_block_update_hint_alloc(struct pcpu_chunk *chunk, int bit_off, * Update e_block. */ if (s_index != e_index) { + if (e_block->contig_hint == PCPU_BITMAP_BLOCK_BITS) + nr_empty_pages++; + /* * When the allocation is across blocks, the end is along * the left part of the e_block. @@ -740,11 +864,14 @@ static void pcpu_block_update_hint_alloc(struct pcpu_chunk *chunk, int bit_off, /* reset the block */ e_block++; } else { + if (e_off > e_block->scan_hint_start) + e_block->scan_hint = 0; + + e_block->left_free = 0; if (e_off > e_block->contig_hint_start) { /* contig hint is broken - scan to fix it */ pcpu_block_refresh_hint(chunk, e_index); } else { - e_block->left_free = 0; e_block->right_free = min_t(int, e_block->right_free, PCPU_BITMAP_BLOCK_BITS - e_off); @@ -752,21 +879,36 @@ static void pcpu_block_update_hint_alloc(struct pcpu_chunk *chunk, int bit_off, } /* update in-between md_blocks */ + nr_empty_pages += (e_index - s_index - 1); for (block = s_block + 1; block < e_block; block++) { + block->scan_hint = 0; block->contig_hint = 0; block->left_free = 0; block->right_free = 0; } } + if (nr_empty_pages) + pcpu_update_empty_pages(chunk, -nr_empty_pages); + + if (pcpu_region_overlap(chunk_md->scan_hint_start, + chunk_md->scan_hint_start + + chunk_md->scan_hint, + bit_off, + bit_off + bits)) + chunk_md->scan_hint = 0; + /* * The only time a full chunk scan is required is if the chunk * contig hint is broken. Otherwise, it means a smaller space * was used and therefore the chunk contig hint is still correct. */ - if (bit_off >= chunk->contig_bits_start && - bit_off < chunk->contig_bits_start + chunk->contig_bits) - pcpu_chunk_refresh_hint(chunk); + if (pcpu_region_overlap(chunk_md->contig_hint_start, + chunk_md->contig_hint_start + + chunk_md->contig_hint, + bit_off, + bit_off + bits)) + pcpu_chunk_refresh_hint(chunk, false); } /** @@ -782,13 +924,15 @@ static void pcpu_block_update_hint_alloc(struct pcpu_chunk *chunk, int bit_off, * * A chunk update is triggered if a page becomes free, a block becomes free, * or the free spans across blocks. This tradeoff is to minimize iterating - * over the block metadata to update chunk->contig_bits. chunk->contig_bits - * may be off by up to a page, but it will never be more than the available - * space. If the contig hint is contained in one block, it will be accurate. + * over the block metadata to update chunk_md->contig_hint. + * chunk_md->contig_hint may be off by up to a page, but it will never be more + * than the available space. If the contig hint is contained in one block, it + * will be accurate. */ static void pcpu_block_update_hint_free(struct pcpu_chunk *chunk, int bit_off, int bits) { + int nr_empty_pages = 0; struct pcpu_block_md *s_block, *e_block, *block; int s_index, e_index; /* block indexes of the freed allocation */ int s_off, e_off; /* block offsets of the freed allocation */ @@ -842,16 +986,22 @@ static void pcpu_block_update_hint_free(struct pcpu_chunk *chunk, int bit_off, /* update s_block */ e_off = (s_index == e_index) ? end : PCPU_BITMAP_BLOCK_BITS; + if (!start && e_off == PCPU_BITMAP_BLOCK_BITS) + nr_empty_pages++; pcpu_block_update(s_block, start, e_off); /* freeing in the same block */ if (s_index != e_index) { /* update e_block */ + if (end == PCPU_BITMAP_BLOCK_BITS) + nr_empty_pages++; pcpu_block_update(e_block, 0, end); /* reset md_blocks in the middle */ + nr_empty_pages += (e_index - s_index - 1); for (block = s_block + 1; block < e_block; block++) { block->first_free = 0; + block->scan_hint = 0; block->contig_hint_start = 0; block->contig_hint = PCPU_BITMAP_BLOCK_BITS; block->left_free = PCPU_BITMAP_BLOCK_BITS; @@ -859,19 +1009,21 @@ static void pcpu_block_update_hint_free(struct pcpu_chunk *chunk, int bit_off, } } + if (nr_empty_pages) + pcpu_update_empty_pages(chunk, nr_empty_pages); + /* - * Refresh chunk metadata when the free makes a page free, a block - * free, or spans across blocks. The contig hint may be off by up to - * a page, but if the hint is contained in a block, it will be accurate - * with the else condition below. + * Refresh chunk metadata when the free makes a block free or spans + * across blocks. The contig_hint may be off by up to a page, but if + * the contig_hint is contained in a block, it will be accurate with + * the else condition below. */ - if ((ALIGN_DOWN(end, min(PCPU_BITS_PER_PAGE, PCPU_BITMAP_BLOCK_BITS)) > - ALIGN(start, min(PCPU_BITS_PER_PAGE, PCPU_BITMAP_BLOCK_BITS))) || - s_index != e_index) - pcpu_chunk_refresh_hint(chunk); + if (((end - start) >= PCPU_BITMAP_BLOCK_BITS) || s_index != e_index) + pcpu_chunk_refresh_hint(chunk, true); else - pcpu_chunk_update(chunk, pcpu_block_off_to_off(s_index, start), - s_block->contig_hint); + pcpu_block_update(&chunk->chunk_md, + pcpu_block_off_to_off(s_index, start), + end); } /** @@ -926,6 +1078,7 @@ static bool pcpu_is_populated(struct pcpu_chunk *chunk, int bit_off, int bits, static int pcpu_find_block_fit(struct pcpu_chunk *chunk, int alloc_bits, size_t align, bool pop_only) { + struct pcpu_block_md *chunk_md = &chunk->chunk_md; int bit_off, bits, next_off; /* @@ -934,12 +1087,12 @@ static int pcpu_find_block_fit(struct pcpu_chunk *chunk, int alloc_bits, * cannot fit in the global hint, there is memory pressure and creating * a new chunk would happen soon. */ - bit_off = ALIGN(chunk->contig_bits_start, align) - - chunk->contig_bits_start; - if (bit_off + alloc_bits > chunk->contig_bits) + bit_off = ALIGN(chunk_md->contig_hint_start, align) - + chunk_md->contig_hint_start; + if (bit_off + alloc_bits > chunk_md->contig_hint) return -1; - bit_off = chunk->first_bit; + bit_off = pcpu_next_hint(chunk_md, alloc_bits); bits = 0; pcpu_for_each_fit_region(chunk, alloc_bits, align, bit_off, bits) { if (!pop_only || pcpu_is_populated(chunk, bit_off, bits, @@ -956,6 +1109,62 @@ static int pcpu_find_block_fit(struct pcpu_chunk *chunk, int alloc_bits, return bit_off; } +/* + * pcpu_find_zero_area - modified from bitmap_find_next_zero_area_off() + * @map: the address to base the search on + * @size: the bitmap size in bits + * @start: the bitnumber to start searching at + * @nr: the number of zeroed bits we're looking for + * @align_mask: alignment mask for zero area + * @largest_off: offset of the largest area skipped + * @largest_bits: size of the largest area skipped + * + * The @align_mask should be one less than a power of 2. + * + * This is a modified version of bitmap_find_next_zero_area_off() to remember + * the largest area that was skipped. This is imperfect, but in general is + * good enough. The largest remembered region is the largest failed region + * seen. This does not include anything we possibly skipped due to alignment. + * pcpu_block_update_scan() does scan backwards to try and recover what was + * lost to alignment. While this can cause scanning to miss earlier possible + * free areas, smaller allocations will eventually fill those holes. + */ +static unsigned long pcpu_find_zero_area(unsigned long *map, + unsigned long size, + unsigned long start, + unsigned long nr, + unsigned long align_mask, + unsigned long *largest_off, + unsigned long *largest_bits) +{ + unsigned long index, end, i, area_off, area_bits; +again: + index = find_next_zero_bit(map, size, start); + + /* Align allocation */ + index = __ALIGN_MASK(index, align_mask); + area_off = index; + + end = index + nr; + if (end > size) + return end; + i = find_next_bit(map, end, index); + if (i < end) { + area_bits = i - area_off; + /* remember largest unused area with best alignment */ + if (area_bits > *largest_bits || + (area_bits == *largest_bits && *largest_off && + (!area_off || __ffs(area_off) > __ffs(*largest_off)))) { + *largest_off = area_off; + *largest_bits = area_bits; + } + + start = i + 1; + goto again; + } + return index; +} + /** * pcpu_alloc_area - allocates an area from a pcpu_chunk * @chunk: chunk of interest @@ -978,7 +1187,9 @@ static int pcpu_find_block_fit(struct pcpu_chunk *chunk, int alloc_bits, static int pcpu_alloc_area(struct pcpu_chunk *chunk, int alloc_bits, size_t align, int start) { + struct pcpu_block_md *chunk_md = &chunk->chunk_md; size_t align_mask = (align) ? (align - 1) : 0; + unsigned long area_off = 0, area_bits = 0; int bit_off, end, oslot; lockdep_assert_held(&pcpu_lock); @@ -988,12 +1199,16 @@ static int pcpu_alloc_area(struct pcpu_chunk *chunk, int alloc_bits, /* * Search to find a fit. */ - end = start + alloc_bits + PCPU_BITMAP_BLOCK_BITS; - bit_off = bitmap_find_next_zero_area(chunk->alloc_map, end, start, - alloc_bits, align_mask); + end = min_t(int, start + alloc_bits + PCPU_BITMAP_BLOCK_BITS, + pcpu_chunk_map_bits(chunk)); + bit_off = pcpu_find_zero_area(chunk->alloc_map, end, start, alloc_bits, + align_mask, &area_off, &area_bits); if (bit_off >= end) return -1; + if (area_bits) + pcpu_block_update_scan(chunk, area_off, area_bits); + /* update alloc map */ bitmap_set(chunk->alloc_map, bit_off, alloc_bits); @@ -1005,8 +1220,8 @@ static int pcpu_alloc_area(struct pcpu_chunk *chunk, int alloc_bits, chunk->free_bytes -= alloc_bits * PCPU_MIN_ALLOC_SIZE; /* update first free bit */ - if (bit_off == chunk->first_bit) - chunk->first_bit = find_next_zero_bit( + if (bit_off == chunk_md->first_free) + chunk_md->first_free = find_next_zero_bit( chunk->alloc_map, pcpu_chunk_map_bits(chunk), bit_off + alloc_bits); @@ -1028,6 +1243,7 @@ static int pcpu_alloc_area(struct pcpu_chunk *chunk, int alloc_bits, */ static void pcpu_free_area(struct pcpu_chunk *chunk, int off) { + struct pcpu_block_md *chunk_md = &chunk->chunk_md; int bit_off, bits, end, oslot; lockdep_assert_held(&pcpu_lock); @@ -1047,24 +1263,34 @@ static void pcpu_free_area(struct pcpu_chunk *chunk, int off) chunk->free_bytes += bits * PCPU_MIN_ALLOC_SIZE; /* update first free bit */ - chunk->first_bit = min(chunk->first_bit, bit_off); + chunk_md->first_free = min(chunk_md->first_free, bit_off); pcpu_block_update_hint_free(chunk, bit_off, bits); pcpu_chunk_relocate(chunk, oslot); } +static void pcpu_init_md_block(struct pcpu_block_md *block, int nr_bits) +{ + block->scan_hint = 0; + block->contig_hint = nr_bits; + block->left_free = nr_bits; + block->right_free = nr_bits; + block->first_free = 0; + block->nr_bits = nr_bits; +} + static void pcpu_init_md_blocks(struct pcpu_chunk *chunk) { struct pcpu_block_md *md_block; + /* init the chunk's block */ + pcpu_init_md_block(&chunk->chunk_md, pcpu_chunk_map_bits(chunk)); + for (md_block = chunk->md_blocks; md_block != chunk->md_blocks + pcpu_chunk_nr_blocks(chunk); - md_block++) { - md_block->contig_hint = PCPU_BITMAP_BLOCK_BITS; - md_block->left_free = PCPU_BITMAP_BLOCK_BITS; - md_block->right_free = PCPU_BITMAP_BLOCK_BITS; - } + md_block++) + pcpu_init_md_block(md_block, PCPU_BITMAP_BLOCK_BITS); } /** @@ -1143,11 +1369,8 @@ static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr, chunk->immutable = true; bitmap_fill(chunk->populated, chunk->nr_pages); chunk->nr_populated = chunk->nr_pages; - chunk->nr_empty_pop_pages = - pcpu_cnt_pop_pages(chunk, start_offset / PCPU_MIN_ALLOC_SIZE, - map_size / PCPU_MIN_ALLOC_SIZE); + chunk->nr_empty_pop_pages = chunk->nr_pages; - chunk->contig_bits = map_size / PCPU_MIN_ALLOC_SIZE; chunk->free_bytes = map_size; if (chunk->start_offset) { @@ -1157,7 +1380,7 @@ static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr, set_bit(0, chunk->bound_map); set_bit(offset_bits, chunk->bound_map); - chunk->first_bit = offset_bits; + chunk->chunk_md.first_free = offset_bits; pcpu_block_update_hint_alloc(chunk, 0, offset_bits); } @@ -1210,7 +1433,6 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp) pcpu_init_md_blocks(chunk); /* init metadata */ - chunk->contig_bits = region_bits; chunk->free_bytes = chunk->nr_pages * PAGE_SIZE; return chunk; @@ -1240,7 +1462,6 @@ static void pcpu_free_chunk(struct pcpu_chunk *chunk) * @chunk: pcpu_chunk which got populated * @page_start: the start page * @page_end: the end page - * @for_alloc: if this is to populate for allocation * * Pages in [@page_start,@page_end) have been populated to @chunk. Update * the bookkeeping information accordingly. Must be called after each @@ -1250,7 +1471,7 @@ static void pcpu_free_chunk(struct pcpu_chunk *chunk) * is to serve an allocation in that area. */ static void pcpu_chunk_populated(struct pcpu_chunk *chunk, int page_start, - int page_end, bool for_alloc) + int page_end) { int nr = page_end - page_start; @@ -1260,10 +1481,7 @@ static void pcpu_chunk_populated(struct pcpu_chunk *chunk, int page_start, chunk->nr_populated += nr; pcpu_nr_populated += nr; - if (!for_alloc) { - chunk->nr_empty_pop_pages += nr; - pcpu_nr_empty_pop_pages += nr; - } + pcpu_update_empty_pages(chunk, nr); } /** @@ -1285,9 +1503,9 @@ static void pcpu_chunk_depopulated(struct pcpu_chunk *chunk, bitmap_clear(chunk->populated, page_start, nr); chunk->nr_populated -= nr; - chunk->nr_empty_pop_pages -= nr; - pcpu_nr_empty_pop_pages -= nr; pcpu_nr_populated -= nr; + + pcpu_update_empty_pages(chunk, -nr); } /* @@ -1374,7 +1592,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL; bool do_warn = !(gfp & __GFP_NOWARN); static int warn_limit = 10; - struct pcpu_chunk *chunk; + struct pcpu_chunk *chunk, *next; const char *err; int slot, off, cpu, ret; unsigned long flags; @@ -1436,11 +1654,14 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, restart: /* search through normal chunks */ for (slot = pcpu_size_to_slot(size); slot < pcpu_nr_slots; slot++) { - list_for_each_entry(chunk, &pcpu_slot[slot], list) { + list_for_each_entry_safe(chunk, next, &pcpu_slot[slot], list) { off = pcpu_find_block_fit(chunk, bits, bit_align, is_atomic); - if (off < 0) + if (off < 0) { + if (slot < PCPU_SLOT_FAIL_THRESHOLD) + pcpu_chunk_move(chunk, 0); continue; + } off = pcpu_alloc_area(chunk, bits, bit_align, off); if (off >= 0) @@ -1499,7 +1720,7 @@ area_found: err = "failed to populate"; goto fail_unlock; } - pcpu_chunk_populated(chunk, rs, re, true); + pcpu_chunk_populated(chunk, rs, re); spin_unlock_irqrestore(&pcpu_lock, flags); } @@ -1698,7 +1919,7 @@ retry_pop: if (!ret) { nr_to_pop -= nr; spin_lock_irq(&pcpu_lock); - pcpu_chunk_populated(chunk, rs, rs + nr, false); + pcpu_chunk_populated(chunk, rs, rs + nr); spin_unlock_irq(&pcpu_lock); } else { nr_to_pop = 0; @@ -1738,6 +1959,7 @@ void free_percpu(void __percpu *ptr) struct pcpu_chunk *chunk; unsigned long flags; int off; + bool need_balance = false; if (!ptr) return; @@ -1759,7 +1981,7 @@ void free_percpu(void __percpu *ptr) list_for_each_entry(pos, &pcpu_slot[pcpu_nr_slots - 1], list) if (pos != chunk) { - pcpu_schedule_balance_work(); + need_balance = true; break; } } @@ -1767,6 +1989,9 @@ void free_percpu(void __percpu *ptr) trace_percpu_free_percpu(chunk->base_addr, off, ptr); spin_unlock_irqrestore(&pcpu_lock, flags); + + if (need_balance) + pcpu_schedule_balance_work(); } EXPORT_SYMBOL_GPL(free_percpu); diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index 4a9aaa3fac8f..6d4a24a7534b 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -602,13 +602,15 @@ int br_add_if(struct net_bridge *br, struct net_device *dev, call_netdevice_notifiers(NETDEV_JOIN, dev); err = dev_set_allmulti(dev, 1); - if (err) - goto put_back; + if (err) { + kfree(p); /* kobject not yet init'd, manually free */ + goto err1; + } err = kobject_init_and_add(&p->kobj, &brport_ktype, &(dev->dev.kobj), SYSFS_BRIDGE_PORT_ATTR); if (err) - goto err1; + goto err2; err = br_sysfs_addif(p); if (err) @@ -700,12 +702,9 @@ err3: sysfs_remove_link(br->ifobj, p->dev->name); err2: kobject_put(&p->kobj); - p = NULL; /* kobject_put frees */ -err1: dev_set_allmulti(dev, -1); -put_back: +err1: dev_put(dev); - kfree(p); return err; } diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index 4e0091311d40..6b07e4978eb3 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -2153,7 +2153,9 @@ static int compat_copy_entries(unsigned char *data, unsigned int size_user, if (ret < 0) return ret; - WARN_ON(size_remaining); + if (size_remaining) + return -EINVAL; + return state->buf_kern_offset; } diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 9ca784c592ac..548f39dde307 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -734,7 +734,9 @@ bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx, flow_keys->nhoff = nhoff; flow_keys->thoff = flow_keys->nhoff; + preempt_disable(); result = BPF_PROG_RUN(prog, ctx); + preempt_enable(); flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff, nhoff, hlen); flow_keys->thoff = clamp_t(u16, flow_keys->thoff, diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 0e2f71ab8367..5dd85ec51bfe 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -263,7 +263,6 @@ int dccp_disconnect(struct sock *sk, int flags) struct inet_connection_sock *icsk = inet_csk(sk); struct inet_sock *inet = inet_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - int err = 0; const int old_state = sk->sk_state; if (old_state != DCCP_CLOSED) @@ -307,7 +306,7 @@ int dccp_disconnect(struct sock *sk, int flags) WARN_ON(inet->inet_num && !icsk->icsk_bind_hash); sk->sk_error_report(sk); - return err; + return 0; } EXPORT_SYMBOL_GPL(dccp_disconnect); diff --git a/net/dsa/slave.c b/net/dsa/slave.c index fe7b6a62e8f1..9892ca1f6859 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -463,6 +463,8 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev) s->tx_bytes += skb->len; u64_stats_update_end(&s->syncp); + DSA_SKB_CB(skb)->deferred_xmit = false; + /* Identify PTP protocol packets, clone them, and pass them to the * switch driver */ diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c index d52db5f2c721..9c3114179690 100644 --- a/net/dsa/tag_brcm.c +++ b/net/dsa/tag_brcm.c @@ -206,10 +206,10 @@ static const struct dsa_device_ops brcm_prepend_netdev_ops = { .rcv = brcm_tag_rcv_prepend, .overhead = BRCM_TAG_LEN, }; -#endif DSA_TAG_DRIVER(brcm_prepend_netdev_ops); MODULE_ALIAS_DSA_TAG_DRIVER(DSA_TAG_PROTO_BRCM_PREPEND); +#endif static struct dsa_tag_driver *dsa_tag_driver_array[] = { #if IS_ENABLED(CONFIG_NET_DSA_TAG_BRCM) diff --git a/net/netfilter/nf_conntrack_h323_asn1.c b/net/netfilter/nf_conntrack_h323_asn1.c index 1601275efe2d..4c2ef42e189c 100644 --- a/net/netfilter/nf_conntrack_h323_asn1.c +++ b/net/netfilter/nf_conntrack_h323_asn1.c @@ -172,7 +172,7 @@ static int nf_h323_error_boundary(struct bitstr *bs, size_t bytes, size_t bits) if (bits % BITS_PER_BYTE > 0) bytes++; - if (*bs->cur + bytes > *bs->end) + if (bs->cur + bytes > bs->end) return 1; return 0; diff --git a/net/netfilter/nf_conntrack_h323_main.c b/net/netfilter/nf_conntrack_h323_main.c index 005589c6d0f6..12de40390e97 100644 --- a/net/netfilter/nf_conntrack_h323_main.c +++ b/net/netfilter/nf_conntrack_h323_main.c @@ -748,24 +748,19 @@ static int callforward_do_filter(struct net *net, } break; } -#if IS_ENABLED(CONFIG_NF_CONNTRACK_IPV6) +#if IS_ENABLED(CONFIG_IPV6) case AF_INET6: { - const struct nf_ipv6_ops *v6ops; struct rt6_info *rt1, *rt2; struct flowi6 fl1, fl2; - v6ops = nf_get_ipv6_ops(); - if (!v6ops) - return 0; - memset(&fl1, 0, sizeof(fl1)); fl1.daddr = src->in6; memset(&fl2, 0, sizeof(fl2)); fl2.daddr = dst->in6; - if (!v6ops->route(net, (struct dst_entry **)&rt1, + if (!nf_ip6_route(net, (struct dst_entry **)&rt1, flowi6_to_flowi(&fl1), false)) { - if (!v6ops->route(net, (struct dst_entry **)&rt2, + if (!nf_ip6_route(net, (struct dst_entry **)&rt2, flowi6_to_flowi(&fl2), false)) { if (ipv6_addr_equal(rt6_nexthop(rt1, &fl1.daddr), rt6_nexthop(rt2, &fl2.daddr)) && diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 8dcc064d518d..7db79c1b8084 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1256,7 +1256,7 @@ static int ctnetlink_del_conntrack(struct net *net, struct sock *ctnl, struct nf_conntrack_tuple tuple; struct nf_conn *ct; struct nfgenmsg *nfmsg = nlmsg_data(nlh); - u_int8_t u3 = nfmsg->nfgen_family; + u_int8_t u3 = nfmsg->version ? nfmsg->nfgen_family : AF_UNSPEC; struct nf_conntrack_zone zone; int err; diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 7aabfd4b1e50..4469519a4879 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -185,14 +185,25 @@ static const struct rhashtable_params nf_flow_offload_rhash_params = { int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow) { - flow->timeout = (u32)jiffies; + int err; - rhashtable_insert_fast(&flow_table->rhashtable, - &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node, - nf_flow_offload_rhash_params); - rhashtable_insert_fast(&flow_table->rhashtable, - &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node, - nf_flow_offload_rhash_params); + err = rhashtable_insert_fast(&flow_table->rhashtable, + &flow->tuplehash[0].node, + nf_flow_offload_rhash_params); + if (err < 0) + return err; + + err = rhashtable_insert_fast(&flow_table->rhashtable, + &flow->tuplehash[1].node, + nf_flow_offload_rhash_params); + if (err < 0) { + rhashtable_remove_fast(&flow_table->rhashtable, + &flow->tuplehash[0].node, + nf_flow_offload_rhash_params); + return err; + } + + flow->timeout = (u32)jiffies; return 0; } EXPORT_SYMBOL_GPL(flow_offload_add); @@ -232,6 +243,7 @@ flow_offload_lookup(struct nf_flowtable *flow_table, { struct flow_offload_tuple_rhash *tuplehash; struct flow_offload *flow; + struct flow_offload_entry *e; int dir; tuplehash = rhashtable_lookup(&flow_table->rhashtable, tuple, @@ -244,6 +256,10 @@ flow_offload_lookup(struct nf_flowtable *flow_table, if (flow->flags & (FLOW_OFFLOAD_DYING | FLOW_OFFLOAD_TEARDOWN)) return NULL; + e = container_of(flow, struct flow_offload_entry, flow); + if (unlikely(nf_ct_is_dying(e->ct))) + return NULL; + return tuplehash; } EXPORT_SYMBOL_GPL(flow_offload_lookup); @@ -290,8 +306,10 @@ static inline bool nf_flow_has_expired(const struct flow_offload *flow) static void nf_flow_offload_gc_step(struct flow_offload *flow, void *data) { struct nf_flowtable *flow_table = data; + struct flow_offload_entry *e; - if (nf_flow_has_expired(flow) || + e = container_of(flow, struct flow_offload_entry, flow); + if (nf_flow_has_expired(flow) || nf_ct_is_dying(e->ct) || (flow->flags & (FLOW_OFFLOAD_DYING | FLOW_OFFLOAD_TEARDOWN))) flow_offload_del(flow_table, flow); } diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c index 6452550d187f..0d603e20b519 100644 --- a/net/netfilter/nf_flow_table_ip.c +++ b/net/netfilter/nf_flow_table_ip.c @@ -181,6 +181,9 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev, iph->protocol != IPPROTO_UDP) return -1; + if (iph->ttl <= 1) + return -1; + thoff = iph->ihl * 4; if (!pskb_may_pull(skb, thoff + sizeof(*ports))) return -1; @@ -408,6 +411,9 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev, ip6h->nexthdr != IPPROTO_UDP) return -1; + if (ip6h->hop_limit <= 1) + return -1; + thoff = sizeof(*ip6h); if (!pskb_may_pull(skb, thoff + sizeof(*ports))) return -1; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index d98416e83d4e..28241e82fd15 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -213,33 +213,33 @@ static int nft_deltable(struct nft_ctx *ctx) return err; } -static int nft_trans_chain_add(struct nft_ctx *ctx, int msg_type) +static struct nft_trans *nft_trans_chain_add(struct nft_ctx *ctx, int msg_type) { struct nft_trans *trans; trans = nft_trans_alloc(ctx, msg_type, sizeof(struct nft_trans_chain)); if (trans == NULL) - return -ENOMEM; + return ERR_PTR(-ENOMEM); if (msg_type == NFT_MSG_NEWCHAIN) nft_activate_next(ctx->net, ctx->chain); list_add_tail(&trans->list, &ctx->net->nft.commit_list); - return 0; + return trans; } static int nft_delchain(struct nft_ctx *ctx) { - int err; + struct nft_trans *trans; - err = nft_trans_chain_add(ctx, NFT_MSG_DELCHAIN); - if (err < 0) - return err; + trans = nft_trans_chain_add(ctx, NFT_MSG_DELCHAIN); + if (IS_ERR(trans)) + return PTR_ERR(trans); ctx->table->use--; nft_deactivate_next(ctx->net, ctx->chain); - return err; + return 0; } static void nft_rule_expr_activate(const struct nft_ctx *ctx, @@ -1189,6 +1189,9 @@ static int nft_dump_stats(struct sk_buff *skb, struct nft_stats __percpu *stats) u64 pkts, bytes; int cpu; + if (!stats) + return 0; + memset(&total, 0, sizeof(total)); for_each_possible_cpu(cpu) { cpu_stats = per_cpu_ptr(stats, cpu); @@ -1246,6 +1249,7 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net, if (nft_is_base_chain(chain)) { const struct nft_base_chain *basechain = nft_base_chain(chain); const struct nf_hook_ops *ops = &basechain->ops; + struct nft_stats __percpu *stats; struct nlattr *nest; nest = nla_nest_start_noflag(skb, NFTA_CHAIN_HOOK); @@ -1267,8 +1271,9 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net, if (nla_put_string(skb, NFTA_CHAIN_TYPE, basechain->type->name)) goto nla_put_failure; - if (rcu_access_pointer(basechain->stats) && - nft_dump_stats(skb, rcu_dereference(basechain->stats))) + stats = rcu_dereference_check(basechain->stats, + lockdep_commit_lock_is_held(net)); + if (nft_dump_stats(skb, stats)) goto nla_put_failure; } @@ -1615,6 +1620,7 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, struct nft_base_chain *basechain; struct nft_stats __percpu *stats; struct net *net = ctx->net; + struct nft_trans *trans; struct nft_chain *chain; struct nft_rule **rules; int err; @@ -1662,7 +1668,7 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, ops->dev = hook.dev; chain->flags |= NFT_BASE_CHAIN; - basechain->policy = policy; + basechain->policy = NF_ACCEPT; } else { chain = kzalloc(sizeof(*chain), GFP_KERNEL); if (chain == NULL) @@ -1698,13 +1704,18 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask, if (err) goto err2; - err = nft_trans_chain_add(ctx, NFT_MSG_NEWCHAIN); - if (err < 0) { + trans = nft_trans_chain_add(ctx, NFT_MSG_NEWCHAIN); + if (IS_ERR(trans)) { + err = PTR_ERR(trans); rhltable_remove(&table->chains_ht, &chain->rhlhead, nft_chain_ht_params); goto err2; } + nft_trans_chain_policy(trans) = -1; + if (nft_is_base_chain(chain)) + nft_trans_chain_policy(trans) = policy; + table->use++; list_add_tail_rcu(&chain->list, &table->chains); @@ -6310,6 +6321,27 @@ static int nf_tables_validate(struct net *net) return 0; } +/* a drop policy has to be deferred until all rules have been activated, + * otherwise a large ruleset that contains a drop-policy base chain will + * cause all packets to get dropped until the full transaction has been + * processed. + * + * We defer the drop policy until the transaction has been finalized. + */ +static void nft_chain_commit_drop_policy(struct nft_trans *trans) +{ + struct nft_base_chain *basechain; + + if (nft_trans_chain_policy(trans) != NF_DROP) + return; + + if (!nft_is_base_chain(trans->ctx.chain)) + return; + + basechain = nft_base_chain(trans->ctx.chain); + basechain->policy = NF_DROP; +} + static void nft_chain_commit_update(struct nft_trans *trans) { struct nft_base_chain *basechain; @@ -6631,6 +6663,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nf_tables_chain_notify(&trans->ctx, NFT_MSG_NEWCHAIN); /* trans destroyed after rcu grace period */ } else { + nft_chain_commit_drop_policy(trans); nft_clear(net, trans->ctx.chain); nf_tables_chain_notify(&trans->ctx, NFT_MSG_NEWCHAIN); nft_trans_destroy(trans); diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c index 6e6b9adf7d38..69d7a8439c7a 100644 --- a/net/netfilter/nft_flow_offload.c +++ b/net/netfilter/nft_flow_offload.c @@ -94,8 +94,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr, if (help) goto out; - if (ctinfo == IP_CT_NEW || - ctinfo == IP_CT_RELATED) + if (!nf_ct_is_confirmed(ct)) goto out; if (test_and_set_bit(IPS_OFFLOAD_BIT, &ct->status)) @@ -113,6 +112,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr, if (ret < 0) goto err_flow_add; + dst_release(route.tuple[!dir].dst); return; err_flow_add: diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c index dd0e97f4f6c0..801872a2e7aa 100644 --- a/net/qrtr/qrtr.c +++ b/net/qrtr/qrtr.c @@ -728,12 +728,13 @@ static int qrtr_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) DECLARE_SOCKADDR(struct sockaddr_qrtr *, addr, msg->msg_name); int (*enqueue_fn)(struct qrtr_node *, struct sk_buff *, int, struct sockaddr_qrtr *, struct sockaddr_qrtr *); + __le32 qrtr_type = cpu_to_le32(QRTR_TYPE_DATA); struct qrtr_sock *ipc = qrtr_sk(sock->sk); struct sock *sk = sock->sk; struct qrtr_node *node; struct sk_buff *skb; + u32 type = 0; size_t plen; - u32 type = QRTR_TYPE_DATA; int rc; if (msg->msg_flags & ~(MSG_DONTWAIT)) @@ -807,8 +808,8 @@ static int qrtr_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) } /* control messages already require the type as 'command' */ - skb_copy_bits(skb, 0, &type, 4); - type = le32_to_cpu(type); + skb_copy_bits(skb, 0, &qrtr_type, 4); + type = le32_to_cpu(qrtr_type); } rc = enqueue_fn(node, skb, type, &ipc->us, addr); diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py index 5010a4d5bfba..894cc58c1a03 100755 --- a/scripts/bpf_helpers_doc.py +++ b/scripts/bpf_helpers_doc.py @@ -1,7 +1,7 @@ #!/usr/bin/python3 # SPDX-License-Identifier: GPL-2.0-only # -# Copyright (C) 2018 Netronome Systems, Inc. +# Copyright (C) 2018-2019 Netronome Systems, Inc. # In case user attempts to run with Python 2. from __future__ import print_function @@ -39,7 +39,7 @@ class Helper(object): Break down helper function protocol into smaller chunks: return type, name, distincts arguments. """ - arg_re = re.compile('((const )?(struct )?(\w+|...))( (\**)(\w+))?$') + arg_re = re.compile('((\w+ )*?(\w+|...))( (\**)(\w+))?$') res = {} proto_re = re.compile('(.+) (\**)(\w+)\(((([^,]+)(, )?){1,5})\)$') @@ -54,8 +54,8 @@ class Helper(object): capture = arg_re.match(a) res['args'].append({ 'type' : capture.group(1), - 'star' : capture.group(6), - 'name' : capture.group(7) + 'star' : capture.group(5), + 'name' : capture.group(6) }) return res diff --git a/scripts/gcc-plugins/arm_ssp_per_task_plugin.c b/scripts/gcc-plugins/arm_ssp_per_task_plugin.c index 89c47f57d1ce..8c1af9bdcb1b 100644 --- a/scripts/gcc-plugins/arm_ssp_per_task_plugin.c +++ b/scripts/gcc-plugins/arm_ssp_per_task_plugin.c @@ -36,7 +36,7 @@ static unsigned int arm_pertask_ssp_rtl_execute(void) mask = GEN_INT(sext_hwi(sp_mask, GET_MODE_PRECISION(Pmode))); masked_sp = gen_reg_rtx(Pmode); - emit_insn_before(gen_rtx_SET(masked_sp, + emit_insn_before(gen_rtx_set(masked_sp, gen_rtx_AND(Pmode, stack_pointer_rtx, mask)), diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index d82b87c16b0a..c61787b15f27 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4649,7 +4649,7 @@ static int selinux_socket_connect_helper(struct socket *sock, struct lsm_network_audit net = {0,}; struct sockaddr_in *addr4 = NULL; struct sockaddr_in6 *addr6 = NULL; - unsigned short snum = 0; + unsigned short snum; u32 sid, perm; /* sctp_connectx(3) calls via selinux_sctp_bind_connect() @@ -4674,12 +4674,12 @@ static int selinux_socket_connect_helper(struct socket *sock, break; default: /* Note that SCTP services expect -EINVAL, whereas - * others must handle this at the protocol level: - * connect(AF_UNSPEC) on a connected socket is - * a documented way disconnect the socket. + * others expect -EAFNOSUPPORT. */ if (sksec->sclass == SECCLASS_SCTP_SOCKET) return -EINVAL; + else + return -EAFNOSUPPORT; } err = sel_netport_sid(sk->sk_protocol, snum, &sid); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 72336bac7573..63e0cf66f01a 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -629,7 +629,7 @@ union bpf_attr { * **BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\ * **->swhash** and *skb*\ **->l4hash** to 0). * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -654,7 +654,7 @@ union bpf_attr { * flexibility and can handle sizes larger than 2 or 4 for the * checksum to update. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -686,7 +686,7 @@ union bpf_attr { * flexibility and can handle sizes larger than 2 or 4 for the * checksum to update. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -741,7 +741,7 @@ union bpf_attr { * efficient, but it is handled through an action code where the * redirection happens only after the eBPF program has returned. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -806,7 +806,7 @@ union bpf_attr { * **ETH_P_8021Q** and **ETH_P_8021AD**, it is considered to * be **ETH_P_8021Q**. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -818,7 +818,7 @@ union bpf_attr { * Description * Pop a VLAN header from the packet associated to *skb*. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1168,7 +1168,7 @@ union bpf_attr { * All values for *flags* are reserved for future usage, and must * be left at zero. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1281,7 +1281,7 @@ union bpf_attr { * implicitly linearizes, unclones and drops offloads from the * *skb*. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1317,7 +1317,7 @@ union bpf_attr { * **bpf_skb_pull_data()** to effectively unclone the *skb* from * the very beginning in case it is indeed cloned. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1369,7 +1369,7 @@ union bpf_attr { * All values for *flags* are reserved for future usage, and must * be left at zero. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1384,7 +1384,7 @@ union bpf_attr { * can be used to prepare the packet for pushing or popping * headers. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1518,20 +1518,20 @@ union bpf_attr { * * **BPF_F_ADJ_ROOM_FIXED_GSO**: Do not adjust gso_size. * Adjusting mss in this way is not allowed for datagrams. * - * * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 **: - * * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 **: + * * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV4**, + * **BPF_F_ADJ_ROOM_ENCAP_L3_IPV6**: * Any new space is reserved to hold a tunnel header. * Configure skb offsets and other fields accordingly. * - * * **BPF_F_ADJ_ROOM_ENCAP_L4_GRE **: - * * **BPF_F_ADJ_ROOM_ENCAP_L4_UDP **: + * * **BPF_F_ADJ_ROOM_ENCAP_L4_GRE**, + * **BPF_F_ADJ_ROOM_ENCAP_L4_UDP**: * Use with ENCAP_L3 flags to further specify the tunnel type. * - * * **BPF_F_ADJ_ROOM_ENCAP_L2(len) **: + * * **BPF_F_ADJ_ROOM_ENCAP_L2**\ (*len*): * Use with ENCAP_L3/L4 flags to further specify the tunnel - * type; **len** is the length of the inner MAC header. + * type; *len* is the length of the inner MAC header. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1610,7 +1610,7 @@ union bpf_attr { * more flexibility as the user is free to store whatever meta * data they need. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1852,7 +1852,7 @@ union bpf_attr { * copied if necessary (i.e. if data was not linear and if start * and end pointers do not point to the same chunk). * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -1886,7 +1886,7 @@ union bpf_attr { * only possible to shrink the packet as of this writing, * therefore *delta* must be a negative integer. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -2061,18 +2061,18 @@ union bpf_attr { * **BPF_LWT_ENCAP_IP** * IP encapsulation (GRE/GUE/IPIP/etc). The outer header * must be IPv4 or IPv6, followed by zero or more - * additional headers, up to LWT_BPF_MAX_HEADROOM total - * bytes in all prepended headers. Please note that - * if skb_is_gso(skb) is true, no more than two headers - * can be prepended, and the inner header, if present, - * should be either GRE or UDP/GUE. - * - * BPF_LWT_ENCAP_SEG6*** types can be called by bpf programs of - * type BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP type can be called - * by bpf programs of types BPF_PROG_TYPE_LWT_IN and - * BPF_PROG_TYPE_LWT_XMIT. - * - * A call to this helper is susceptible to change the underlaying + * additional headers, up to **LWT_BPF_MAX_HEADROOM** + * total bytes in all prepended headers. Please note that + * if **skb_is_gso**\ (*skb*) is true, no more than two + * headers can be prepended, and the inner header, if + * present, should be either GRE or UDP/GUE. + * + * **BPF_LWT_ENCAP_SEG6**\ \* types can be called by BPF programs + * of type **BPF_PROG_TYPE_LWT_IN**; **BPF_LWT_ENCAP_IP** type can + * be called by bpf programs of types **BPF_PROG_TYPE_LWT_IN** and + * **BPF_PROG_TYPE_LWT_XMIT**. + * + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -2087,7 +2087,7 @@ union bpf_attr { * inside the outermost IPv6 Segment Routing Header can be * modified through this helper. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -2103,7 +2103,7 @@ union bpf_attr { * after the segments are accepted. *delta* can be as well * positive (growing) as negative (shrinking). * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -2126,13 +2126,13 @@ union bpf_attr { * Type of *param*: **int**. * **SEG6_LOCAL_ACTION_END_B6** * End.B6 action: Endpoint bound to an SRv6 policy. - * Type of param: **struct ipv6_sr_hdr**. + * Type of *param*: **struct ipv6_sr_hdr**. * **SEG6_LOCAL_ACTION_END_B6_ENCAP** * End.B6.Encap action: Endpoint bound to an SRv6 * encapsulation policy. - * Type of param: **struct ipv6_sr_hdr**. + * Type of *param*: **struct ipv6_sr_hdr**. * - * A call to this helper is susceptible to change the underlaying + * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers * previously done by the verifier are invalidated and must be * performed again, if the helper is used in combination with @@ -2285,7 +2285,8 @@ union bpf_attr { * Return * Pointer to **struct bpf_sock**, or **NULL** in case of failure. * For sockets with reuseport option, the **struct bpf_sock** - * result is from **reuse->socks**\ [] using the hash of the tuple. + * result is from *reuse*\ **->socks**\ [] using the hash of the + * tuple. * * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u64 netns, u64 flags) * Description @@ -2321,7 +2322,8 @@ union bpf_attr { * Return * Pointer to **struct bpf_sock**, or **NULL** in case of failure. * For sockets with reuseport option, the **struct bpf_sock** - * result is from **reuse->socks**\ [] using the hash of the tuple. + * result is from *reuse*\ **->socks**\ [] using the hash of the + * tuple. * * int bpf_sk_release(struct bpf_sock *sock) * Description @@ -2490,31 +2492,34 @@ union bpf_attr { * network namespace *netns*. The return value must be checked, * and if non-**NULL**, released via **bpf_sk_release**\ (). * - * This function is identical to bpf_sk_lookup_tcp, except that it - * also returns timewait or request sockets. Use bpf_sk_fullsock - * or bpf_tcp_socket to access the full structure. + * This function is identical to **bpf_sk_lookup_tcp**\ (), except + * that it also returns timewait or request sockets. Use + * **bpf_sk_fullsock**\ () or **bpf_tcp_sock**\ () to access the + * full structure. * * This helper is available only if the kernel was compiled with * **CONFIG_NET** configuration option. * Return * Pointer to **struct bpf_sock**, or **NULL** in case of failure. * For sockets with reuseport option, the **struct bpf_sock** - * result is from **reuse->socks**\ [] using the hash of the tuple. + * result is from *reuse*\ **->socks**\ [] using the hash of the + * tuple. * * int bpf_tcp_check_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len) * Description - * Check whether iph and th contain a valid SYN cookie ACK for - * the listening socket in sk. + * Check whether *iph* and *th* contain a valid SYN cookie ACK for + * the listening socket in *sk*. * - * iph points to the start of the IPv4 or IPv6 header, while - * iph_len contains sizeof(struct iphdr) or sizeof(struct ip6hdr). + * *iph* points to the start of the IPv4 or IPv6 header, while + * *iph_len* contains **sizeof**\ (**struct iphdr**) or + * **sizeof**\ (**struct ip6hdr**). * - * th points to the start of the TCP header, while th_len contains - * sizeof(struct tcphdr). + * *th* points to the start of the TCP header, while *th_len* + * contains **sizeof**\ (**struct tcphdr**). * * Return - * 0 if iph and th are a valid SYN cookie ACK, or a negative error - * otherwise. + * 0 if *iph* and *th* are a valid SYN cookie ACK, or a negative + * error otherwise. * * int bpf_sysctl_get_name(struct bpf_sysctl *ctx, char *buf, size_t buf_len, u64 flags) * Description @@ -2592,17 +2597,17 @@ union bpf_attr { * and save the result in *res*. * * The string may begin with an arbitrary amount of white space - * (as determined by isspace(3)) followed by a single optional '-' - * sign. + * (as determined by **isspace**\ (3)) followed by a single + * optional '**-**' sign. * * Five least significant bits of *flags* encode base, other bits * are currently unused. * * Base must be either 8, 10, 16 or 0 to detect it automatically - * similar to user space strtol(3). + * similar to user space **strtol**\ (3). * Return * Number of characters consumed on success. Must be positive but - * no more than buf_len. + * no more than *buf_len*. * * **-EINVAL** if no valid digits were found or unsupported base * was provided. @@ -2616,16 +2621,16 @@ union bpf_attr { * given base and save the result in *res*. * * The string may begin with an arbitrary amount of white space - * (as determined by isspace(3)). + * (as determined by **isspace**\ (3)). * * Five least significant bits of *flags* encode base, other bits * are currently unused. * * Base must be either 8, 10, 16 or 0 to detect it automatically - * similar to user space strtoul(3). + * similar to user space **strtoul**\ (3). * Return * Number of characters consumed on success. Must be positive but - * no more than buf_len. + * no more than *buf_len*. * * **-EINVAL** if no valid digits were found or unsupported base * was provided. @@ -2634,26 +2639,26 @@ union bpf_attr { * * void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags) * Description - * Get a bpf-local-storage from a sk. + * Get a bpf-local-storage from a *sk*. * * Logically, it could be thought of getting the value from * a *map* with *sk* as the **key**. From this * perspective, the usage is not much different from - * **bpf_map_lookup_elem(map, &sk)** except this - * helper enforces the key must be a **bpf_fullsock()** - * and the map must be a BPF_MAP_TYPE_SK_STORAGE also. + * **bpf_map_lookup_elem**\ (*map*, **&**\ *sk*) except this + * helper enforces the key must be a full socket and the map must + * be a **BPF_MAP_TYPE_SK_STORAGE** also. * * Underneath, the value is stored locally at *sk* instead of - * the map. The *map* is used as the bpf-local-storage **type**. - * The bpf-local-storage **type** (i.e. the *map*) is searched - * against all bpf-local-storages residing at sk. + * the *map*. The *map* is used as the bpf-local-storage + * "type". The bpf-local-storage "type" (i.e. the *map*) is + * searched against all bpf-local-storages residing at *sk*. * - * An optional *flags* (BPF_SK_STORAGE_GET_F_CREATE) can be + * An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be * used such that a new bpf-local-storage will be * created if one does not exist. *value* can be used - * together with BPF_SK_STORAGE_GET_F_CREATE to specify + * together with **BPF_SK_STORAGE_GET_F_CREATE** to specify * the initial value of a bpf-local-storage. If *value* is - * NULL, the new bpf-local-storage will be zero initialized. + * **NULL**, the new bpf-local-storage will be zero initialized. * Return * A bpf-local-storage pointer is returned on success. * @@ -2662,7 +2667,7 @@ union bpf_attr { * * int bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk) * Description - * Delete a bpf-local-storage from a sk. + * Delete a bpf-local-storage from a *sk*. * Return * 0 on success. * diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 11a65db4b93f..7e3b79d7c25f 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -44,6 +44,7 @@ #include "btf.h" #include "str_error.h" #include "libbpf_util.h" +#include "libbpf_internal.h" #ifndef EM_BPF #define EM_BPF 247 @@ -128,6 +129,10 @@ struct bpf_capabilities { __u32 name:1; /* v5.2: kernel support for global data sections. */ __u32 global_data:1; + /* BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO support */ + __u32 btf_func:1; + /* BTF_KIND_VAR and BTF_KIND_DATASEC support */ + __u32 btf_datasec:1; }; /* @@ -1021,6 +1026,74 @@ static bool section_have_execinstr(struct bpf_object *obj, int idx) return false; } +static void bpf_object__sanitize_btf(struct bpf_object *obj) +{ + bool has_datasec = obj->caps.btf_datasec; + bool has_func = obj->caps.btf_func; + struct btf *btf = obj->btf; + struct btf_type *t; + int i, j, vlen; + __u16 kind; + + if (!obj->btf || (has_func && has_datasec)) + return; + + for (i = 1; i <= btf__get_nr_types(btf); i++) { + t = (struct btf_type *)btf__type_by_id(btf, i); + kind = BTF_INFO_KIND(t->info); + + if (!has_datasec && kind == BTF_KIND_VAR) { + /* replace VAR with INT */ + t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0); + t->size = sizeof(int); + *(int *)(t+1) = BTF_INT_ENC(0, 0, 32); + } else if (!has_datasec && kind == BTF_KIND_DATASEC) { + /* replace DATASEC with STRUCT */ + struct btf_var_secinfo *v = (void *)(t + 1); + struct btf_member *m = (void *)(t + 1); + struct btf_type *vt; + char *name; + + name = (char *)btf__name_by_offset(btf, t->name_off); + while (*name) { + if (*name == '.') + *name = '_'; + name++; + } + + vlen = BTF_INFO_VLEN(t->info); + t->info = BTF_INFO_ENC(BTF_KIND_STRUCT, 0, vlen); + for (j = 0; j < vlen; j++, v++, m++) { + /* order of field assignments is important */ + m->offset = v->offset * 8; + m->type = v->type; + /* preserve variable name as member name */ + vt = (void *)btf__type_by_id(btf, v->type); + m->name_off = vt->name_off; + } + } else if (!has_func && kind == BTF_KIND_FUNC_PROTO) { + /* replace FUNC_PROTO with ENUM */ + vlen = BTF_INFO_VLEN(t->info); + t->info = BTF_INFO_ENC(BTF_KIND_ENUM, 0, vlen); + t->size = sizeof(__u32); /* kernel enforced */ + } else if (!has_func && kind == BTF_KIND_FUNC) { + /* replace FUNC with TYPEDEF */ + t->info = BTF_INFO_ENC(BTF_KIND_TYPEDEF, 0, 0); + } + } +} + +static void bpf_object__sanitize_btf_ext(struct bpf_object *obj) +{ + if (!obj->btf_ext) + return; + + if (!obj->caps.btf_func) { + btf_ext__free(obj->btf_ext); + obj->btf_ext = NULL; + } +} + static int bpf_object__elf_collect(struct bpf_object *obj, int flags) { Elf *elf = obj->efile.elf; @@ -1164,8 +1237,10 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags) obj->btf = NULL; } else { err = btf__finalize_data(obj, obj->btf); - if (!err) + if (!err) { + bpf_object__sanitize_btf(obj); err = btf__load(obj->btf); + } if (err) { pr_warning("Error finalizing and loading %s into kernel: %d. Ignored and continue.\n", BTF_ELF_SEC, err); @@ -1187,6 +1262,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags) BTF_EXT_ELF_SEC, PTR_ERR(obj->btf_ext)); obj->btf_ext = NULL; + } else { + bpf_object__sanitize_btf_ext(obj); } } } @@ -1556,12 +1633,63 @@ bpf_object__probe_global_data(struct bpf_object *obj) return 0; } +static int bpf_object__probe_btf_func(struct bpf_object *obj) +{ + const char strs[] = "\0int\0x\0a"; + /* void x(int a) {} */ + __u32 types[] = { + /* int */ + BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ + /* FUNC_PROTO */ /* [2] */ + BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0), + BTF_PARAM_ENC(7, 1), + /* FUNC x */ /* [3] */ + BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2), + }; + int res; + + res = libbpf__probe_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs)); + if (res < 0) + return res; + if (res > 0) + obj->caps.btf_func = 1; + return 0; +} + +static int bpf_object__probe_btf_datasec(struct bpf_object *obj) +{ + const char strs[] = "\0x\0.data"; + /* static int a; */ + __u32 types[] = { + /* int */ + BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ + /* VAR x */ /* [2] */ + BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1), + BTF_VAR_STATIC, + /* DATASEC val */ /* [3] */ + BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4), + BTF_VAR_SECINFO_ENC(2, 0, 4), + }; + int res; + + res = libbpf__probe_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs)); + if (res < 0) + return res; + if (res > 0) + obj->caps.btf_datasec = 1; + return 0; +} + static int bpf_object__probe_caps(struct bpf_object *obj) { int (*probe_fn[])(struct bpf_object *obj) = { bpf_object__probe_name, bpf_object__probe_global_data, + bpf_object__probe_btf_func, + bpf_object__probe_btf_datasec, }; int i, ret; diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h new file mode 100644 index 000000000000..789e435b5900 --- /dev/null +++ b/tools/lib/bpf/libbpf_internal.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ + +/* + * Internal libbpf helpers. + * + * Copyright (c) 2019 Facebook + */ + +#ifndef __LIBBPF_LIBBPF_INTERNAL_H +#define __LIBBPF_LIBBPF_INTERNAL_H + +#define BTF_INFO_ENC(kind, kind_flag, vlen) \ + ((!!(kind_flag) << 31) | ((kind) << 24) | ((vlen) & BTF_MAX_VLEN)) +#define BTF_TYPE_ENC(name, info, size_or_type) (name), (info), (size_or_type) +#define BTF_INT_ENC(encoding, bits_offset, nr_bits) \ + ((encoding) << 24 | (bits_offset) << 16 | (nr_bits)) +#define BTF_TYPE_INT_ENC(name, encoding, bits_offset, bits, sz) \ + BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_INT, 0, 0), sz), \ + BTF_INT_ENC(encoding, bits_offset, bits) +#define BTF_MEMBER_ENC(name, type, bits_offset) (name), (type), (bits_offset) +#define BTF_PARAM_ENC(name, type) (name), (type) +#define BTF_VAR_SECINFO_ENC(type, offset, size) (type), (offset), (size) + +int libbpf__probe_raw_btf(const char *raw_types, size_t types_len, + const char *str_sec, size_t str_len); + +#endif /* __LIBBPF_LIBBPF_INTERNAL_H */ diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c index a2c64a9ce1a6..5e2aa83f637a 100644 --- a/tools/lib/bpf/libbpf_probes.c +++ b/tools/lib/bpf/libbpf_probes.c @@ -15,6 +15,7 @@ #include "bpf.h" #include "libbpf.h" +#include "libbpf_internal.h" static bool grep(const char *buffer, const char *pattern) { @@ -132,21 +133,43 @@ bool bpf_probe_prog_type(enum bpf_prog_type prog_type, __u32 ifindex) return errno != EINVAL && errno != EOPNOTSUPP; } -static int load_btf(void) +int libbpf__probe_raw_btf(const char *raw_types, size_t types_len, + const char *str_sec, size_t str_len) { -#define BTF_INFO_ENC(kind, kind_flag, vlen) \ - ((!!(kind_flag) << 31) | ((kind) << 24) | ((vlen) & BTF_MAX_VLEN)) -#define BTF_TYPE_ENC(name, info, size_or_type) \ - (name), (info), (size_or_type) -#define BTF_INT_ENC(encoding, bits_offset, nr_bits) \ - ((encoding) << 24 | (bits_offset) << 16 | (nr_bits)) -#define BTF_TYPE_INT_ENC(name, encoding, bits_offset, bits, sz) \ - BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_INT, 0, 0), sz), \ - BTF_INT_ENC(encoding, bits_offset, bits) -#define BTF_MEMBER_ENC(name, type, bits_offset) \ - (name), (type), (bits_offset) - - const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l"; + struct btf_header hdr = { + .magic = BTF_MAGIC, + .version = BTF_VERSION, + .hdr_len = sizeof(struct btf_header), + .type_len = types_len, + .str_off = types_len, + .str_len = str_len, + }; + int btf_fd, btf_len; + __u8 *raw_btf; + + btf_len = hdr.hdr_len + hdr.type_len + hdr.str_len; + raw_btf = malloc(btf_len); + if (!raw_btf) + return -ENOMEM; + + memcpy(raw_btf, &hdr, sizeof(hdr)); + memcpy(raw_btf + hdr.hdr_len, raw_types, hdr.type_len); + memcpy(raw_btf + hdr.hdr_len + hdr.type_len, str_sec, hdr.str_len); + + btf_fd = bpf_load_btf(raw_btf, btf_len, NULL, 0, false); + if (btf_fd < 0) { + free(raw_btf); + return 0; + } + + close(btf_fd); + free(raw_btf); + return 1; +} + +static int load_sk_storage_btf(void) +{ + const char strs[] = "\0bpf_spin_lock\0val\0cnt\0l"; /* struct bpf_spin_lock { * int val; * }; @@ -155,7 +178,7 @@ static int load_btf(void) * struct bpf_spin_lock l; * }; */ - __u32 btf_raw_types[] = { + __u32 types[] = { /* int */ BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ /* struct bpf_spin_lock */ /* [2] */ @@ -166,23 +189,9 @@ static int load_btf(void) BTF_MEMBER_ENC(19, 1, 0), /* int cnt; */ BTF_MEMBER_ENC(23, 2, 32),/* struct bpf_spin_lock l; */ }; - struct btf_header btf_hdr = { - .magic = BTF_MAGIC, - .version = BTF_VERSION, - .hdr_len = sizeof(struct btf_header), - .type_len = sizeof(btf_raw_types), - .str_off = sizeof(btf_raw_types), - .str_len = sizeof(btf_str_sec), - }; - __u8 raw_btf[sizeof(struct btf_header) + sizeof(btf_raw_types) + - sizeof(btf_str_sec)]; - - memcpy(raw_btf, &btf_hdr, sizeof(btf_hdr)); - memcpy(raw_btf + sizeof(btf_hdr), btf_raw_types, sizeof(btf_raw_types)); - memcpy(raw_btf + sizeof(btf_hdr) + sizeof(btf_raw_types), - btf_str_sec, sizeof(btf_str_sec)); - return bpf_load_btf(raw_btf, sizeof(raw_btf), 0, 0, 0); + return libbpf__probe_raw_btf((char *)types, sizeof(types), + strs, sizeof(strs)); } bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex) @@ -222,7 +231,7 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex) value_size = 8; max_entries = 0; map_flags = BPF_F_NO_PREALLOC; - btf_fd = load_btf(); + btf_fd = load_sk_storage_btf(); if (btf_fd < 0) return false; break; diff --git a/tools/power/x86/turbostat/Makefile b/tools/power/x86/turbostat/Makefile index 1598b4fa0b11..045f5f7d68ab 100644 --- a/tools/power/x86/turbostat/Makefile +++ b/tools/power/x86/turbostat/Makefile @@ -9,7 +9,7 @@ ifeq ("$(origin O)", "command line") endif turbostat : turbostat.c -override CFLAGS += -Wall +override CFLAGS += -Wall -I../../../include override CFLAGS += -DMSRHEADER='"../../../../arch/x86/include/asm/msr-index.h"' override CFLAGS += -DINTEL_FAMILY_HEADER='"../../../../arch/x86/include/asm/intel-family.h"' diff --git a/tools/power/x86/x86_energy_perf_policy/Makefile b/tools/power/x86/x86_energy_perf_policy/Makefile index ae7a0e09b722..1fdeef864e7c 100644 --- a/tools/power/x86/x86_energy_perf_policy/Makefile +++ b/tools/power/x86/x86_energy_perf_policy/Makefile @@ -9,7 +9,7 @@ ifeq ("$(origin O)", "command line") endif x86_energy_perf_policy : x86_energy_perf_policy.c -override CFLAGS += -Wall +override CFLAGS += -Wall -I../../../include override CFLAGS += -DMSRHEADER='"../../../../arch/x86/include/asm/msr-index.h"' %: %.c diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore index 41e8a689aa77..a877803e4ba8 100644 --- a/tools/testing/selftests/bpf/.gitignore +++ b/tools/testing/selftests/bpf/.gitignore @@ -32,3 +32,5 @@ test_tcpnotify_user test_libbpf test_tcp_check_syncookie_user alu32 +libbpf.pc +libbpf.so.* diff --git a/tools/testing/selftests/bpf/verifier/jump.c b/tools/testing/selftests/bpf/verifier/jump.c index 8e6fcc8940f0..6f951d1ff0a4 100644 --- a/tools/testing/selftests/bpf/verifier/jump.c +++ b/tools/testing/selftests/bpf/verifier/jump.c @@ -178,3 +178,198 @@ .result_unpriv = REJECT, .result = ACCEPT, }, +{ + "jump test 6", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_MOV64_IMM(BPF_REG_1, 2), + BPF_JMP_IMM(BPF_JA, 0, 0, 2), + BPF_MOV64_IMM(BPF_REG_0, 2), + BPF_EXIT_INSN(), + BPF_JMP_REG(BPF_JNE, BPF_REG_0, BPF_REG_1, 16), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, 0), + BPF_JMP_IMM(BPF_JA, 0, 0, -20), + }, + .result = ACCEPT, + .retval = 2, +}, +{ + "jump test 7", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 2), + BPF_MOV64_IMM(BPF_REG_0, 3), + BPF_EXIT_INSN(), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 2, 16), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_JMP_IMM(BPF_JA, 0, 0, -20), + }, + .result = ACCEPT, + .retval = 3, +}, +{ + "jump test 8", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_MOV64_IMM(BPF_REG_1, 2), + BPF_JMP_IMM(BPF_JA, 0, 0, 2), + BPF_MOV64_IMM(BPF_REG_0, 3), + BPF_EXIT_INSN(), + BPF_JMP_REG(BPF_JNE, BPF_REG_0, BPF_REG_1, 16), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_JMP_IMM(BPF_JA, 0, 0, -20), + }, + .result = ACCEPT, + .retval = 3, +}, +{ + "jump/call test 9", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_JMP_IMM(BPF_JA, 0, 0, 2), + BPF_MOV64_IMM(BPF_REG_0, 3), + BPF_EXIT_INSN(), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 2, 16), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, -20), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = REJECT, + .errstr = "jump out of range from insn 1 to 4", +}, +{ + "jump/call test 10", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2), + BPF_MOV64_IMM(BPF_REG_0, 3), + BPF_EXIT_INSN(), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 2, 16), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, -20), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = REJECT, + .errstr = "last insn is not an exit or jmp", +}, +{ + "jump/call test 11", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4), + BPF_MOV64_IMM(BPF_REG_0, 3), + BPF_EXIT_INSN(), + BPF_MOV64_IMM(BPF_REG_0, 3), + BPF_EXIT_INSN(), + BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 2, 26), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_MOV64_IMM(BPF_REG_0, 42), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, -31), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + .result = ACCEPT, + .retval = 3, +}, |