summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'fixes-for-linus' of ↵Linus Torvalds2015-12-1221-32/+76
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "Here are a bunch of small bug fixes for various ARM platforms, nothing really sticks out this week, most of either fixes bugs in code that was just added in 4.4, or that has been broken for many years without anyone noticing. at91/sama5d2: - fix sama5de hardware setup of sd/mmc interface - proper selection of pinctrl drivers. PIO4 is necessary for sama5d2 berlin: - fix incorrect clock input for SDIO exynos: - Fix potential NULL pointer dereference in Exynos PMU driver. imx: - Fix vf610 SAI clock configuration bug which is discovered by the newly added master mode support in SAI audio driver. - Fix buggy L2 cache latency values in vf610 device trees, which may cause system hang when cpu runs at a higher frequency. ixp4xx: - fix prototypes for readl/writel functions ls2080a: - use little-endian register access for GPIO and SDHCI omap: - Fix clock source for ARM TWD and global timers on am437x - Always select REGULATOR_FIXED_VOLTAGE for omap2+ instead of when MACH_OMAP3_PANDORA is selected - Fix SPI DMA handles for dm816x as only some were mapped - Fix up mbox cells for dm816x to make mailbox usable pxa: - use PWM lookup table for all ezx machines s3c24xx: - Remove incorrect __init annotation from s3c24xx cpufreq driver structures. versatile: - fix PCI IRQ mapping on Versatile PB" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ls2080a/dts: Add little endian property for GPIO IP block dt-bindings: define little-endian property for QorIQ GPIO ARM64: dts: ls2080a: fix eSDHC endianness ARM: dts: vf610: use reset values for L2 cache latencies ARM: pxa: use PWM lookup table for all machines ARM: dts: berlin: add 2nd clock for BG2Q sdhci0 and sdhci1 ARM: dts: berlin: correct BG2Q's sdhci2 2nd clock ARM: dts: am4372: fix clock source for arm twd and global timers ARM: at91: fix pinctrl driver selection ARM: at91/dt: add always-on to 1.8V regulator ARM: dts: vf610: fix clock definition for SAI2 ARM: imx: clk-vf610: fix SAI clock tree ARM: ixp4xx: fix read{b,w,l} return types irqchip/versatile-fpga: Fix PCI IRQ mapping on Versatile PB ARM: OMAP2+: enable REGULATOR_FIXED_VOLTAGE ARM: dts: add dm816x missing spi DT dma handles ARM: dts: add dm816x missing #mbox-cells cpufreq: s3c24xx: Do not mark s3c2410_plls_add as __init ARM: EXYNOS: Fix potential NULL pointer access in exynos_sys_powerdown_conf
| * Merge tag 'imx-fixes-4.4-2' of ↵Kevin Hilman2015-12-114-12/+9
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes Merge "ARM: imx: fixes for 4.4, 2nd round" from Shawn Guo: The i.MX fixes for 4.4, 2nd round: - Fix vf610 SAI clock configuration bug which is discovered by the newly added master mode support in SAI audio driver. - Fix buggy L2 cache latency values in vf610 device trees, which may cause system hang when cpu runs at a higher frequency. * tag 'imx-fixes-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: vf610: use reset values for L2 cache latencies ARM: dts: vf610: fix clock definition for SAI2 ARM: imx: clk-vf610: fix SAI clock tree
| | * ARM: dts: vf610: use reset values for L2 cache latenciesStefan Agner2015-12-112-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux on Vybrid used several different L2 latencies so far, none of them seem to be the right ones. According to the application note AN4947 ("Understanding Vybrid Architecture"), the tag portion runs on CPU clock and is inside the L2 cache controller, whereas the data portion is stored in the external SRAM running on platform clock. Hence it is likely that the correct value requires a higher data latency then tag latency. These are the values which have been used so far: - The mainline values: arm,data-latency = <1 1 1>; arm,tag-latency = <2 2 2>; Those values have lead to problems on higher clocks. They look like a poor translation from the reset values (missing +1 offset and a mix up between tag/latency values). - The Linux 3.0 (SoC vendor BSP) values (converted to DT notation): arm,data-latency = <4 2 3> arm,tag-latency = <4 2 3> The cache initialization function along with the value matches the i.MX6 code from the same kernel, so it seems that those values have just been copied. - The Colibri values: arm,data-latency = <2 1 2>; arm,tag-latency = <3 2 3>; Those were a mix between the values of the Linux 3.0 based BSP and the mainline values above. - The SoC Reset values (converted to DT notation): arm,data-latency = <3 3 3>; arm,tag-latency = <2 2 2>; So far there is no official statement on what the correct values are. See also the related Freescale community thread: https://community.freescale.com/message/579785#579785 For now, the reset values seem to be the best bet. Remove all other "bogus" values and use the reset value on vf610.dtsi level. Signed-off-by: Stefan Agner <stefan@agner.ch> Cc: <stable@vger.kernel.org> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
| | * ARM: dts: vf610: fix clock definition for SAI2Stefan Agner2015-12-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, only the bus clock has been assigned, but in reality the SAI IP has for clock inputs. The driver has been updated to make use of the additional clock inputs by c3ecef21c3f2 ("ASoC: fsl_sai: add sai master mode support"). Due to a bug in the clock tree, the audio clock has been enabled none the less by the specified bus clock (see "ARM: imx: clk-vf610: fix SAI clock tree"), which made master mode even without the proper clock assigned working. This patch completes the clock definition for SAI2. On Vybrid, only two MCLK out of the four options are available (the first being the bus clock itself). See chapter 8.10.1.2.3 of the Vybrid Reference manual ("SAI transmitter and receiver options for MCLK selection"). Note: The audio clocks are only required in master mode. Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
| | * ARM: imx: clk-vf610: fix SAI clock treeStefan Agner2015-12-021-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Synchronous Audio Interface (SAI) instances are clocked by independent clocks: The bus clock and the audio clock (as shown in Figure 51-1 in the Vybrid Reference Manual). The clock gates in CCGR0/CCGR1 for SAI0 through SAI3 are bus clock gates, as access tests to the registers with/without gating those clocks have shown. The audio clock is gated by the SAIx_EN gates in CCM_CSCDR1, followed by a clock divider (SAIx_DIV). Currently, the parent of the bus clock gates has been assigned to SAIx_DIV, which is not involved in the bus clock path for the SAI instances (see chapter 9.10.12, SAI clocking in the Vybrid Reference Manual). Fix this by define the parent clock of VF610_CLK_SAIx to be the bus clock. If the driver needs the audio clock (when used in master mode), a fixed device tree is required which assign the audio clock properly to VF610_CLK_SAIx_DIV. Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
| * | ls2080a/dts: Add little endian property for GPIO IP blockLiu Gang2015-12-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GPIO block for ls2080a platform has little endian registers, the GPIO driver needs this property to read/write registers by right interface. Signed-off-by: Liu Gang <Gang.Liu@freescale.com> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Kevin Hilman <khilman@linaro.org>
| * | dt-bindings: define little-endian property for QorIQ GPIOLi Yang2015-12-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GPIO block on different QorIQ chips could have registers in different endianess. Define the property to specify which endian is used by the hardware. Signed-off-by: Liu Gang <Gang.Liu@freescale.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Kevin Hilman <khilman@linaro.org>
| * | ARM64: dts: ls2080a: fix eSDHC endiannessyangbo lu2015-12-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the "little-endian" property to fix the issue that eSDHC is not working and dumping out "mmc0: Controller never released inhibit bit(s)." error messages constantly. Fixes: 5461597f6ce0 ("dts/ls2080a: Update DTSI to add support of various peripherals") Signed-off-by: Yangbo Lu <yangbo.lu@freescale.com> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Kevin Hilman <khilman@linaro.org>
| * | Merge tag 'omap-for-v4.4/fixes-rc4' of ↵Arnd Bergmann2015-12-114-5/+17
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Merge "omap fixes against v4.4-rc4" from Tony Lindgren Few fixes for omaps for v4.4-rc cycle: - Fix clock source for ARM TWD and global timers on am437x - Always select REGULATOR_FIXED_VOLTAGE for omap2+ instead of when MACH_OMAP3_PANDORA is selected - Fix SPI DMA handles for dm816x as only some were mapped - Fix up mbox cells for dm816x to make mailbox usable * tag 'omap-for-v4.4/fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: am4372: fix clock source for arm twd and global timers ARM: OMAP2+: enable REGULATOR_FIXED_VOLTAGE ARM: dts: add dm816x missing spi DT dma handles ARM: dts: add dm816x missing #mbox-cells
| | * | ARM: dts: am4372: fix clock source for arm twd and global timersGrygorii Strashko2015-12-092-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM TWD and Global timer are clocked by PERIPHCLK which is MPU_CLK/2. But now they are clocked by dpll_mpu_m2_ck == MPU_CLK and, as result. Timekeeping core misbehaves. For example, execution of command "sleep 5" will take 10 sec instead of 5. Hence, fix it by adding mpu_periphclk ("fixed-factor-clock") and use it for clocking ARM TWD and Global timer (same way as on OMAP4). Cc: Tony Lindgren <tony@atomide.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Tero Kristo <t-kristo@ti.com> Fixes:commit 8cbd4c2f6a99 ("arm: boot: dts: am4372: add ARM timers and SCU nodes") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | * | ARM: OMAP2+: enable REGULATOR_FIXED_VOLTAGEGrygorii Strashko2015-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable REGULATOR_FIXED_VOLTAGE for all OMAP2+ platforms otherwise system can't boot from SD-card when kernel is built for single SoC (for example, with CONFIG_SOC_DRA7XX=y only). It's also required for almost all TI SoC's platforms. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | * | ARM: dts: add dm816x missing spi DT dma handlesNeil Armstrong2015-11-301-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the missing SPI controller DMA handler in the dm816x DT node, only properties for the two channels on four were present. Cc: Brian Hutchinson <b.hutchman@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | * | ARM: dts: add dm816x missing #mbox-cellsNeil Armstrong2015-11-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing #mbox-cells for dm816x mbox DT node. Cc: Brian Hutchinson <b.hutchman@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * | | ARM: pxa: use PWM lookup table for all machinesArnd Bergmann2015-12-111-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent change to use a pwm lookup table for the ezx machines was incomplete and only changed the a780 model, but not the other ones in the same file. This adds the missing calls to pwm_add_table(). Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: c3322022897c ("ARM: pxa: ezx: Use PWM lookup table") Acked-by: Thierry Reding <thierry.reding@gmail.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
| * | | Merge tag 'berlin-fixes-for-4.4-rc1-1' of ↵Arnd Bergmann2015-12-111-3/+5
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.infradead.org/users/hesselba/linux-berlin into fixes Merge "Marvell Berlin fixes for 4.4-rc1 (round 1)" from Sebastian Hesselbarth: - fix wrong SDIO DT clocks on BG2Q * tag 'berlin-fixes-for-4.4-rc1-1' of git://git.infradead.org/users/hesselba/linux-berlin: ARM: dts: berlin: add 2nd clock for BG2Q sdhci0 and sdhci1 ARM: dts: berlin: correct BG2Q's sdhci2 2nd clock
| | * | | ARM: dts: berlin: add 2nd clock for BG2Q sdhci0 and sdhci1Jisheng Zhang2015-12-101-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We removed CLK_IGNORE_UNUSED from CLKID_SDIO's flag, so the sdhci0 and sdhci1 don't work. We fix this by adding the optional 2nd clock for BG2Q's sdhci0 and sdhci1. This patch brings another benefit: the 2nd clock can be disabled during runtime pm, so saves power a bit. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
| | * | | ARM: dts: berlin: correct BG2Q's sdhci2 2nd clockJisheng Zhang2015-12-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The optional 2nd clock is CLKID_SDIO. We removed CLK_IGNORE_UNUSED from CLKID_SDIO's flag, so the sdhci2 doesn't work. This patch fixes this issue by correcting the sdhci2's 2nd clock. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
| * | | | Merge tag 'at91-4.4-fixes-2' of ↵Arnd Bergmann2015-12-113-2/+12
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux into fixes Merge "Second fixes for 4.4" from Alexandre Belloni: - fix of a hardware setup that prevents the sd/mmc interface to show up on sama5d2. - proper selection of pinctrl drivers. PIO4 is necessary for the sama5d2 to boot. * tag 'at91-4.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: ARM: at91: fix pinctrl driver selection ARM: at91/dt: add always-on to 1.8V regulator
| | * | | | ARM: at91: fix pinctrl driver selectionLudovic Desroches2015-12-042-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the selection of the pinctrl driver to SoC family level since we have two pinctrl drivers. It is useless to select one which is not compatible with the SoC. [abelloni: fixed pm.c when only sama2d2 is selected] Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
| | * | | | ARM: at91/dt: add always-on to 1.8V regulatorNicolas Ferre2015-12-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the SDHCI controller needs the 1.8V line to be always enabled for some eMMC configurations, set the proper "regulator-always-on" property to the board DTS files. Note that the sdhci classical regulator definitions doesn't suit our controller for this 1.8V purpose. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
| * | | | | Merge tag 'samsung-fixes-4.4' of ↵Arnd Bergmann2015-12-114-4/+8
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into fixes Merge "Fixes for Exynos" from Krzysztof Kozlowski: 1. Fix potential NULL pointer dereference in Exynos PMU driver. 2. Remove incorrect __init annotation from s3c24xx cpufreq driver structures. * tag 'samsung-fixes-4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: cpufreq: s3c24xx: Do not mark s3c2410_plls_add as __init ARM: EXYNOS: Fix potential NULL pointer access in exynos_sys_powerdown_conf
| | * | | | | cpufreq: s3c24xx: Do not mark s3c2410_plls_add as __initArnd Bergmann2015-11-273-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s3c2410_plls_add is a device notifier that may be called at runtime and is correctly not marked __init. However it calls s3c_plltab_register() which is marked __init, and that triggers a build error when we are checking for section mismatches: WARNING: vmlinux.o(.text+0x195e0): Section mismatch in reference from the function s3c2410_plls_add() to the function .init.text:s3c_plltab_register() The function s3c2410_plls_add() references the function __init s3c_plltab_register(). This is often because s3c2410_plls_add lacks a __init annotation or the annotation of s3c_plltab_register is wrong. This removes the __init annotation from s3c2410_plls_add as well as the __initdata section annotations from s3c2440_plls_12 and s3c2440_plls_169344, which in turn are referenced from s3c2410_plls_add. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
| | * | | | | ARM: EXYNOS: Fix potential NULL pointer access in exynos_sys_powerdown_confPankaj Dubey2015-11-171-1/+5
| | | |/ / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If no platform devices binded to the driver but driver itself loaded and exynos_sys_powerdown_conf is called from arch/arm/mach-exynos/{suspend.c, pm.c} it will result in NULL pointer access, to prevent this added check on pmu_context for NULL. Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
| * | | | | ARM: ixp4xx: fix read{b,w,l} return typesArnd Bergmann2015-12-011-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On ixp4xx, the readl() function returns an 'unsigned long' output when indirect I/O is used. This is unlike any other platform, and it causes lots of harmless compiler warnings, such as: drivers/ata/libahci.c: In function 'ahci_show_host_version': drivers/ata/libahci.c:254:22: warning: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'long unsigned int' [-Wformat=] drivers/block/mtip32xx/mtip32xx.c: In function 'mtip_hw_read_registers': drivers/block/mtip32xx/mtip32xx.c:2602:31: warning: format '%X' expects argument of type 'unsigned int', but argument 3 has type 'long unsigned int' [-Wformat=] drivers/block/cciss.c: In function 'print_cfg_table': drivers/block/cciss.c:3845:25: warning: format '%d' expects argument of type 'int', but argument 4 has type 'long unsigned int' [-Wformat=] This changes all six of the ixp4xx specific I/O read functions to return the same types that we have in the normal asm/io.h, to avoid the warnings. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Krzysztof Halasa <khalasa@piap.pl>
| * | | | | irqchip/versatile-fpga: Fix PCI IRQ mapping on Versatile PBGuillaume Delbergue2015-12-011-0/+5
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is specifically for PCI support on the Versatile PB board using a DT. Currently, the dynamic IRQ mapping is broken when using DTs. For example, on QEMU, the SCSI driver is unable to request the IRQ. To fix this issue, this patch replaces the current dynamic mechanism with a static value as is done in the non-DT case. Signed-off-by: Guillaume Delbergue <guillaume.delbergue@greensocs.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org
* | | | | Merge tag 'powerpc-4.4-4' of ↵Linus Torvalds2015-12-124-48/+34
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - opal-irqchip: Fix double endian conversion from Alistair Popple - cxl: Set endianess of kernel contexts from Frederic Barrat - sbc8641: drop bogus PHY IRQ entries from DTS file from Paul Gortmaker - Revert "powerpc/eeh: Don't unfreeze PHB PE after reset" from Andrew Donnellan * tag 'powerpc-4.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: Revert "powerpc/eeh: Don't unfreeze PHB PE after reset" powerpc/sbc8641: drop bogus PHY IRQ entries from DTS file cxl: Set endianess of kernel contexts powerpc/opal-irqchip: Fix double endian conversion
| * | | | | Revert "powerpc/eeh: Don't unfreeze PHB PE after reset"Andrew Donnellan2015-12-091-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 527d10ef3a315d3cb9dc098dacd61889a6c26439. The reverted commit breaks cxlflash devices following an EEH reset (and possibly other cxl devices, however this has not been tested). The reverted commit changed the behaviour of eeh_reset_device() so that PHB PEs are not unfrozen following the completion of the reset. This should not be problematic, as no device resources should have been associated with the PHB PE. However, when attempting to load the cxlflash driver after a reset, the driver attempts to read Vital Product Data through a call to pci_read_vpd() (which is called on the physical cxl device, not on the virtual AFU device). pci_read_vpd() in turn attempts to read from the cxl device's config space. This fails, as the PE it's trying to read from is still frozen. In turn, the driver gets an -ENODEV and fails to initialise. It appears this issue only affects some parts of the VPD area, as "lspci -vvv", which only reads a subset of the VPD bytes, is not broken by the original patch. At this stage, we don't fully understand why we're trying to read a frozen PE, and we don't know how this affects other cxl devices. It is possible that there is an underlying bug in the cxl driver or the powerpc CAPI support code, or alternatively a bug in the PCI resource allocation/mapping code that is incorrectly mapping resources to PE#0. As such, this fix is incomplete, however it is necessary to prevent a serious regression in CAPI support. In the meantime, revert the commit, especially as it was intended to be a non-functional change. Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Daniel Axtens <dja@axtens.net> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/sbc8641: drop bogus PHY IRQ entries from DTS filePaul Gortmaker2015-12-091-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file was originally cloned off of the MPC8641D-HPCN reference platform, which actually had a PHY IRQ line connected. However this board does not. The bogus entry was largely inert and went undetected until commit 321beec5047af83db90c88114b7e664b156f49fe ("net: phy: Use interrupts when available in NOLINK state") was added to the tree. With the above commit, the board fails to NFS boot since it sits waiting for a PHY IRQ event that of course never arrives. Removing the bogus entries from the DTS file fixes the issue. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | cxl: Set endianess of kernel contextsFrederic Barrat2015-12-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A process element (defined in CAIA) keeps track of the endianess of contexts through the Little Endian (LE) bit of the State Register. It is currently set for user contexts, but was somehow forgotten for kernel contexts, so this patch fixes it. It could lead to erratic behavior from an AFU when the context is attached through the kernel API. Fixes: 2f663527bd6a ("cxl: Configure PSL for kernel contexts and merge code") Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Suggested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/opal-irqchip: Fix double endian conversionAlistair Popple2015-12-081-29/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The OPAL event calls return a mask of events that are active in big endian format. This is checked when unmasking the events in the irqchip by comparison with a cached value. The cached value was stored in big endian format but should've been converted to CPU endian first. This bug leads to OPAL event delivery being delayed or dropped on some systems. Symptoms may include a non-functional console. The bug is fixed by calling opal_handle_events(...) instead of duplicating code in opal_event_unmask(...). Fixes: 9f0fd0499d30 ("powerpc/powernv: Add a virtual irqchip for opal events") Cc: stable@vger.kernel.org # v4.2+ Reported-by: Douglas L Lehr <dllehr@us.ibm.com> Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2015-12-1218-62/+77
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "17 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MIPS: fix DMA contiguous allocation sh64: fix __NR_fgetxattr ocfs2: fix SGID not inherited issue mm/oom_kill.c: avoid attempting to kill init sharing same memory drivers/base/memory.c: prohibit offlining of memory blocks with missing sections tmpfs: fix shmem_evict_inode() warnings on i_blocks mm/hugetlb.c: fix resv map memory leak for placeholder entries mm: hugetlb: call huge_pte_alloc() only if ptep is null kernel: remove stop_machine() Kconfig dependency mm: kmemleak: mark kmemleak_init prototype as __init mm: fix kerneldoc on mem_cgroup_replace_page osd fs: __r4w_get_page rely on PageUptodate for uptodate MAINTAINERS: make Vladimir co-maintainer of the memory controller mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo memcg: fix memory.high target mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
| * | | | | | MIPS: fix DMA contiguous allocationQais Yousef2015-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent changes to how GFP_ATOMIC is defined seems to have broken the condition to use mips_alloc_from_contiguous() in mips_dma_alloc_coherent(). I couldn't bottom out the exact change but I think it's this commit d0164adc89f6 ("mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd"). GFP_ATOMIC has multiple bits set and the check for !(gfp & GFP_ATOMIC) isn't enough. The reason behind this condition is to check whether we can potentially do a sleeping memory allocation. Use gfpflags_allow_blocking() instead which should be more robust. Signed-off-by: Qais Yousef <qais.yousef@imgtec.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | sh64: fix __NR_fgetxattrDmitry V. Levin2015-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr has to be defined to 259, but it doesn't. Instead, it's defined to 269, which is of course used by another syscall, __NR_sched_setaffinity in this case. This bug was found by strace test suite. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | ocfs2: fix SGID not inherited issueJunxiao Bi2015-12-121-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an issue, SGID of sub dir was not inherited from its parents dir. It is because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but is overwritten by "mode" which don't have SGID set later. Fixes: 8f1eb48758aa ("ocfs2: fix umask ignored issue") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | mm/oom_kill.c: avoid attempting to kill init sharing same memoryChen Jie2015-12-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible that an oom killed victim shares an ->mm with the init process and thus oom_kill_process() would end up trying to kill init as well. This has been shown in practice: Out of memory: Kill process 9134 (init) score 3 or sacrifice child Killed process 9134 (init) total-vm:1868kB, anon-rss:84kB, file-rss:572kB Kill process 1 (init) sharing same memory ... Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 And this will result in a kernel panic. If a process is forked by init and selected for oom kill while still sharing init_mm, then it's likely this system is in a recoverable state. However, it's better not to try to kill init and allow the machine to panic due to unkillable processes. [rientjes@google.com: rewrote changelog] [akpm@linux-foundation.org: fix inverted test, per Ben] Signed-off-by: Chen Jie <chenjie6@huawei.com> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | drivers/base/memory.c: prohibit offlining of memory blocks with missing sectionsSeth Jennings2015-12-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory x86-64 systems") and 982792c782ef ("x86, mm: probe memory block size for generic x86 64bit") introduced large block sizes for x86. This made it possible to have multiple sections per memory block where previously, there was a only every one section per block. Since blocks consist of contiguous ranges of section, there can be holes in the blocks where sections are not present. If one attempts to offline such a block, a crash occurs since the code is not designed to deal with this. This patch is a quick fix to gaurd against the crash by not allowing blocks with non-present sections to be offlined. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781 Signed-off-by: Seth Jennings <sjennings@variantweb.net> Reported-by: Andrew Banman <abanman@sgi.com> Cc: Daniel J Blueman <daniel@numascale.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Russ Anderson <rja@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | tmpfs: fix shmem_evict_inode() warnings on i_blocksHugh Dickins2015-12-121-20/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dmitry Vyukov provides a little program, autogenerated by syzkaller, which races a fault on a mapping of a sparse memfd object, against truncation of that object below the fault address: run repeatedly for a few minutes, it reliably generates shmem_evict_inode()'s WARN_ON(inode->i_blocks). (But there's nothing specific to memfd here, nor to the fstat which it happened to use to generate the fault: though that looked suspicious, since a shmem_recalc_inode() had been added there recently. The same problem can be reproduced with open+unlink in place of memfd_create, and with fstatfs in place of fstat.) v3.7 commit 0f3c42f522dc ("tmpfs: change final i_blocks BUG to WARNING") explains one cause of such a warning (a race with shmem_writepage to swap), and possible solutions; but we never took it further, and this syzkaller incident turns out to have a different cause. shmem_getpage_gfp()'s error recovery, when a freshly allocated page is then found to be beyond eof, looks plausible - decrementing the alloced count that was just before incremented - but in fact can go wrong, if a racing thread (the truncator, for example) gets its shmem_recalc_inode() in just after our delete_from_page_cache(). delete_from_page_cache() decrements nrpages, that shmem_recalc_inode() will balance the books by decrementing alloced itself, then our decrement of alloced take it one too low: leading to the WARNING when the object is finally evicted. Once the new page has been exposed in the page cache, shmem_getpage_gfp() must leave it to shmem_recalc_inode() itself to get the accounting right in all cases (and not fall through from "trunc:" to "decused:"). Adjust that error recovery block; and the reinitialization of info and sbinfo can be removed too. While we're here, fix shmem_writepage() to avoid the original issue: it will be safe against a racing shmem_recalc_inode(), if it merely increments swapped before the shmem_delete_from_page_cache() which decrements nrpages (but it must then do its own shmem_recalc_inode() before that, while still in balance, instead of after). (Aside: why do we shmem_recalc_inode() here in the swap path? Because its raison d'etre is to cope with clean sparse shmem pages being reclaimed behind our back: so here when swapping is a good place to look for that case.) But I've not now managed to reproduce this bug, even without the patch. I don't see why I didn't do that earlier: perhaps inhibited by the preference to eliminate shmem_recalc_inode() altogether. Driven by this incident, I do now have a patch to do so at last; but still want to sit on it for a bit, there's a couple of questions yet to be resolved. Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | mm/hugetlb.c: fix resv map memory leak for placeholder entriesMike Kravetz2015-12-121-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dmitry Vyukov reported the following memory leak unreferenced object 0xffff88002eaafd88 (size 32): comm "a.out", pid 5063, jiffies 4295774645 (age 15.810s) hex dump (first 32 bytes): 28 e9 4e 63 00 88 ff ff 28 e9 4e 63 00 88 ff ff (.Nc....(.Nc.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: kmalloc include/linux/slab.h:458 region_chg+0x2d4/0x6b0 mm/hugetlb.c:398 __vma_reservation_common+0x2c3/0x390 mm/hugetlb.c:1791 vma_needs_reservation mm/hugetlb.c:1813 alloc_huge_page+0x19e/0xc70 mm/hugetlb.c:1845 hugetlb_no_page mm/hugetlb.c:3543 hugetlb_fault+0x7a1/0x1250 mm/hugetlb.c:3717 follow_hugetlb_page+0x339/0xc70 mm/hugetlb.c:3880 __get_user_pages+0x542/0xf30 mm/gup.c:497 populate_vma_page_range+0xde/0x110 mm/gup.c:919 __mm_populate+0x1c7/0x310 mm/gup.c:969 do_mlock+0x291/0x360 mm/mlock.c:637 SYSC_mlock2 mm/mlock.c:658 SyS_mlock2+0x4b/0x70 mm/mlock.c:648 Dmitry identified a potential memory leak in the routine region_chg, where a region descriptor is not free'ed on an error path. However, the root cause for the above memory leak resides in region_del. In this specific case, a "placeholder" entry is created in region_chg. The associated page allocation fails, and the placeholder entry is left in the reserve map. This is "by design" as the entry should be deleted when the map is released. The bug is in the region_del routine which is used to delete entries within a specific range (and when the map is released). region_del did not handle the case where a placeholder entry exactly matched the start of the range range to be deleted. In this case, the entry would not be deleted and leaked. The fix is to take these special placeholder entries into account in region_del. The region_chg error path leak is also fixed. Fixes: feba16e25a57 ("mm/hugetlb: add region_del() to delete a specific range of entries") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> [4.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | mm: hugetlb: call huge_pte_alloc() only if ptep is nullNaoya Horiguchi2015-12-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently at the beginning of hugetlb_fault(), we call huge_pte_offset() and check whether the obtained *ptep is a migration/hwpoison entry or not. And if not, then we get to call huge_pte_alloc(). This is racy because the *ptep could turn into migration/hwpoison entry after the huge_pte_offset() check. This race results in BUG_ON in huge_pte_alloc(). We don't have to call huge_pte_alloc() when the huge_pte_offset() returns non-NULL, so let's fix this bug with moving the code into else block. Note that the *ptep could turn into a migration/hwpoison entry after this block, but that's not a problem because we have another !pte_present check later (we never go into hugetlb_no_page() in that case.) Fixes: 290408d4a250 ("hugetlb: hugepage migration core") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> [2.6.36+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | kernel: remove stop_machine() Kconfig dependencyChris Wilson2015-12-123-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the full stop_machine() routine is only enabled on SMP if module unloading is enabled, or if the CPUs are hotpluggable. This leads to configurations where stop_machine() is broken as it will then only run the callback on the local CPU with irqs disabled, and not stop the other CPUs or run the callback on them. For example, this breaks MTRR setup on x86 in certain configs since ea8596bb2d8d379 ("kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions") as the MTRR is only established on the boot CPU. This patch removes the Kconfig option for STOP_MACHINE and uses the SMP and HOTPLUG_CPU config options to compile the correct stop_machine() for the architecture, removing the false dependency on MODULE_UNLOAD in the process. Link: https://lkml.org/lkml/2014/10/8/124 References: https://bugs.freedesktop.org/show_bug.cgi?id=84794 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | mm: kmemleak: mark kmemleak_init prototype as __initNicolas Iooss2015-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kmemleak_init() definition in mm/kmemleak.c is marked __init but its prototype in include/linux/kmemleak.h is marked __ref since commit a6186d89c913 ("kmemleak: Mark the early log buffer as __initdata"). This causes a section mismatch which is reported as a warning when building with clang -Wsection, because kmemleak_init() is declared in section .ref.text but defined in .init.text. Fix this by marking kmemleak_init() prototype __init. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | mm: fix kerneldoc on mem_cgroup_replace_pageHugh Dickins2015-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whoops, I missed removing the kerneldoc comment of the lrucare arg removed from mem_cgroup_replace_page; but it's a good comment, keep it. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | osd fs: __r4w_get_page rely on PageUptodate for uptodateHugh Dickins2015-12-122-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 42cb14b110a5 ("mm: migrate dirty page without clear_page_dirty_for_io etc") simplified the migration of a PageDirty pagecache page: one stat needs moving from zone to zone and that's about all. It's convenient and safest for it to shift the PageDirty bit from old page to new, just before updating the zone stats: before copying data and marking the new PageUptodate. This is all done while both pages are isolated and locked, just as before; and just as before, there's a moment when the new page is visible in the radix_tree, but not yet PageUptodate. What's new is that it may now be briefly visible as PageDirty before it is PageUptodate. When I scoured the tree to see if this could cause a problem anywhere, the only places I found were in two similar functions __r4w_get_page(): which look up a page with find_get_page() (not using page lock), then claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate. I'm not sure whether that was right before, but now it might be wrong (on rare occasions): only claim the page is uptodate if PageUptodate. Or perhaps the page in question could never be migratable anyway? Signed-off-by: Hugh Dickins <hughd@google.com> Tested-by: Boaz Harrosh <ooo@electrozaur.com> Cc: Benny Halevy <bhalevy@panasas.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | MAINTAINERS: make Vladimir co-maintainer of the memory controllerJohannes Weiner2015-12-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vladimir architected and authored much of the current state of the memcg's slab memory accounting and tracking. Make sure he gets CC'd on bug reports ;-) Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any ↵Michal Hocko2015-12-122-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | progress Tetsuo Handa has reported that the system might basically livelock in OOM condition without triggering the OOM killer. The issue is caused by internal dependency of the direct reclaim on vmstat counter updates (via zone_reclaimable) which are performed from the workqueue context. If all the current workers get assigned to an allocation request, though, they will be looping inside the allocator trying to reclaim memory but zone_reclaimable can see stalled numbers so it will consider a zone reclaimable even though it has been scanned way too much. WQ concurrency logic will not consider this situation as a congested workqueue because it relies that worker would have to sleep in such a situation. This also means that it doesn't try to spawn new workers or invoke the rescuer thread if the one is assigned to the queue. In order to fix this issue we need to do two things. First we have to let wq concurrency code know that we are in trouble so we have to do a short sleep. In order to prevent from issues handled by 0e093d99763e ("writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone") we limit the sleep only to worker threads which are the ones of the interest anyway. The second thing to do is to create a dedicated workqueue for vmstat and mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to have a spare worker thread for it. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Tejun Heo <tj@kernel.org> Cc: Cristopher Lameter <clameter@sgi.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Arkadiusz Miskiewicz <arekm@maven.pl> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfoVlastimil Babka2015-12-122-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 016c13daa5c9 ("mm, page_alloc: use masks and shifts when converting GFP flags to migrate types") has swapped MIGRATE_MOVABLE and MIGRATE_RECLAIMABLE in the enum definition. However, migratetype_names wasn't updated to reflect that. As a result, the file /proc/pagetypeinfo shows the counts for Movable as Reclaimable and vice versa. Additionally, commit 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand") introduced MIGRATE_HIGHATOMIC, but did not add a letter to distinguish it into show_migration_types(), so it doesn't appear in the listing of free areas during page alloc failures or oom kills. This patch fixes both problems. The atomic reserves will show with a letter 'H' in the free areas listings. Fixes: 016c13daa5c9 ("mm, page_alloc: use masks and shifts when converting GFP flags to migrate types") Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | memcg: fix memory.high targetVladimir Davydov2015-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the memory.high threshold is exceeded, try_charge() schedules a task_work to reclaim the excess. The reclaim target is set to the number of pages requested by try_charge(). This is wrong, because try_charge() usually charges more pages than requested (batch > nr_pages) in order to refill per cpu stocks. As a result, a process in a cgroup can easily exceed memory.high significantly when doing a lot of charges w/o returning to userspace (e.g. reading a file in big chunks). Fix this issue by assuring that when exceeding memory.high a process reclaims as many pages as were actually charged (i.e. batch). Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | mm: hugetlb: fix hugepage memory leak caused by wrong reserve countNaoya Horiguchi2015-12-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on alloc_buddy_huge_page() to directly create a hugepage from the buddy allocator. In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement h->resv_huge_pages, which means that successful hugetlb_fault() returns without releasing the reserve count. As a result, subsequent hugetlb_fault() might fail despite that there are still free hugepages. This patch simply adds decrementing code on that code path. I reproduced this problem when testing v4.3 kernel in the following situation: - the test machine/VM is a NUMA system, - hugepage overcommiting is enabled, - most of hugepages are allocated and there's only one free hugepage which is on node 0 (for example), - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to node 1, tries to allocate a hugepage, - the allocation should fail but the reserve count is still hold. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | Merge branch 'parisc-4.4-3' of ↵Linus Torvalds2015-12-125-27/+13
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "Fix the boot crash on Mako machines with Huge Pages, prevent a panic with SATA controllers (and others) by correctly calculating the IOMMU space, hook up the mlock2 syscall and drop unneeded code in the parisc pci code" * 'parisc-4.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Disable huge pages on Mako machines parisc: Wire up mlock2 syscall parisc: Remove unused pcibios_init_bus() parisc iommu: fix panic due to trying to allocate too large region
| * | | | | | | parisc: Disable huge pages on Mako machinesHelge Deller2015-12-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mako-based machines (PA8800 and PA8900 CPUs) don't allow aliasing on non-equaivalent addresses. Signed-off-by: Helge Deller <deller@gmx.de>