summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* mfd: cros_ec: Add API for rwsigGwendal Grignou2019-06-101-0/+26
| | | | | | | | | | | Add command to retrieve signature of image stored in the RW memory slot(s). Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add API for Fingerprint supportGwendal Grignou2019-06-101-0/+228
| | | | | | | | | | Add API for fingerprint sensor presented by embedded controller. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add API for Touchpad supportGwendal Grignou2019-06-101-0/+26
| | | | | | | | | | Add API to control touchpad presented by Embedded Controller. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add API for EC-EC communicationGwendal Grignou2019-06-101-0/+95
| | | | | | | | | | | Allow EC to talk to other ECs that are not presented to the host. Neeed when EC are present in detachable keyboard. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add I2C passthru protection APIGwendal Grignou2019-06-101-0/+22
| | | | | | | | | | Prevent direct i2c access to device behind EC when not in development mode. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add Smart Battery Firmware update APIGwendal Grignou2019-06-101-0/+73
| | | | | | | | | | Add API to update battery firmware. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add Hibernate APIGwendal Grignou2019-06-101-2/+70
| | | | | | | | | | Add support for controlling hibernation of the Embedded Controller. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add API for keyboard testingGwendal Grignou2019-06-101-0/+18
| | | | | | | | | | Add command to allow keyboard testing in factory. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Complete Power and USB PD APIGwendal Grignou2019-06-101-8/+228
| | | | | | | | | | Improve API for USB Powe delivery and power management. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Fix temperature APIGwendal Grignou2019-06-101-7/+57
| | | | | | | | | | Improve API to retrieve temperature information. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add fingerprint APIGwendal Grignou2019-06-101-0/+34
| | | | | | | | | | Add support for fingerprint sensors managed by embedded controller. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Fix event processing APIGwendal Grignou2019-06-101-11/+76
| | | | | | | | | | Improve API between EC and Host to report events. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Complete MEMS sensor APIGwendal Grignou2019-06-101-58/+406
| | | | | | | | | | Add new command for batched mode, add support for more sensors. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add EC transport protocol v4Gwendal Grignou2019-06-101-2/+141
| | | | | | | | | | Introduce a new transport procotol between EC and host. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Expand hash APIGwendal Grignou2019-06-101-2/+9
| | | | | | | | | | Improve API to verify EC image signature. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add lightbar v2 APIGwendal Grignou2019-06-101-4/+120
| | | | | | | | | | New API split commands, improve EC command latency. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add PWM_SET_DUTY APIGwendal Grignou2019-06-101-2/+18
| | | | | | | | | | Add API for fan control. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Add Flash V2 commands APIGwendal Grignou2019-06-101-3/+147
| | | | | | | | | | Added for supporting larger embedded controller flash. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Remove zero-size structsGwendal Grignou2019-06-101-15/+18
| | | | | | | | | | | | Empty structure size is different between C and C++. To prevent clang warning when compiling this include file in C++ programs, remove empty structures. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: move HDMI CEC API definitionGwendal Grignou2019-06-101-73/+75
| | | | | | | | | | Move near the end of file. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Update ACPI interface definitionGwendal Grignou2019-06-101-126/+293
| | | | | | | | | | | Add more fields and improve API when EC presents data through ACPI memory space. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: use BIT macroGwendal Grignou2019-06-101-55/+55
| | | | | | | | | | Replace (1 << ...) with BIT(). Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Define commands as 4-digit UPPER CASE hex valuesGwendal Grignou2019-06-101-99/+136
| | | | | | | | | | | This change is required for compilation of embedded controller firmware to work properly (See CONFIG_HOSTCMD_SECTION_SORTED). Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: add ec_align macrosGwendal Grignou2019-06-101-204/+238
| | | | | | | | | | | | | | To reduce code and improve performance of the embedded controller firmware, pragma __aligned(2) or __aligned(4) are used when alignment to 16 or 32 bit boundary is expected. Define all ec_align to packed when compiling kernel. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: set comments properlyGwendal Grignou2019-06-101-25/+40
| | | | | | | | | | Fix comments syntax and spelling errors. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Zero BUILD_ macroGwendal Grignou2019-06-101-0/+5
| | | | | | | | | | | Defined out build macro used when compiling embedded controller firmware. Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* mfd: cros_ec: Update license termGwendal Grignou2019-06-101-15/+5
| | | | | | | | | | Update to SPDX-License-Identifier, GPL-2.0 Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Acked-by: Benson Leung <bleung@chromium.org> Reviewed-by: Fabien Lahoudere <fabien.lahoudere@collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
* Merge branch 'akpm' (patches from Andrew)Linus Torvalds2019-05-192-2/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge yet more updates from Andrew Morton: "A few final bits: - large changes to vmalloc, yielding large performance benefits - tweak the console-flush-on-panic code - a few fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: panic: add an option to replay all the printk message in buffer initramfs: don't free a non-existent initrd fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock mm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macro mm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macro mm/vmalloc.c: keep track of free blocks for vmap allocation
| * panic: add an option to replay all the printk message in bufferFeng Tang2019-05-181-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently on panic, kernel will lower the loglevel and print out pending printk msg only with console_flush_on_panic(). Add an option for users to configure the "panic_print" to replay all dmesg in buffer, some of which they may have never seen due to the loglevel setting, which will help panic debugging . [feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()] Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com [feng.tang@intel.com: use logbuf lock to protect the console log index] Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Aaro Koskinen <aaro.koskinen@nokia.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm/vmalloc.c: keep track of free blocks for vmap allocationUladzislau Rezki (Sony)2019-05-181-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "improve vmap allocation", v3. Objective --------- Please have a look for the description at: https://lkml.org/lkml/2018/10/19/786 but let me also summarize it a bit here as well. The current implementation has O(N) complexity. Requests with different permissive parameters can lead to long allocation time. When i say "long" i mean milliseconds. Description ----------- This approach organizes the KVA memory layout into free areas of the 1-ULONG_MAX range, i.e. an allocation is done over free areas lookups, instead of finding a hole between two busy blocks. It allows to have lower number of objects which represent the free space, therefore to have less fragmented memory allocator. Because free blocks are always as large as possible. It uses the augment tree where all free areas are sorted in ascending order of va->va_start address in pair with linked list that provides O(1) access to prev/next elements. Since the tree is augment, we also maintain the "subtree_max_size" of VA that reflects a maximum available free block in its left or right sub-tree. Knowing that, we can easily traversal toward the lowest (left most path) free area. Allocation: ~O(log(N)) complexity. It is sequential allocation method therefore tends to maximize locality. The search is done until a first suitable block is large enough to encompass the requested parameters. Bigger areas are split. I copy paste here the description of how the area is split, since i described it in https://lkml.org/lkml/2018/10/19/786 <snip> A free block can be split by three different ways. Their names are FL_FIT_TYPE, LE_FIT_TYPE/RE_FIT_TYPE and NE_FIT_TYPE, i.e. they correspond to how requested size and alignment fit to a free block. FL_FIT_TYPE - in this case a free block is just removed from the free list/tree because it fully fits. Comparing with current design there is an extra work with rb-tree updating. LE_FIT_TYPE/RE_FIT_TYPE - left/right edges fit. In this case what we do is just cutting a free block. It is as fast as a current design. Most of the vmalloc allocations just end up with this case, because the edge is always aligned to 1. NE_FIT_TYPE - Is much less common case. Basically it happens when requested size and alignment does not fit left nor right edges, i.e. it is between them. In this case during splitting we have to build a remaining left free area and place it back to the free list/tree. Comparing with current design there are two extra steps. First one is we have to allocate a new vmap_area structure. Second one we have to insert that remaining free block to the address sorted list/tree. In order to optimize a first case there is a cache with free_vmap objects. Instead of allocating from slab we just take an object from the cache and reuse it. Second one is pretty optimized. Since we know a start point in the tree we do not do a search from the top. Instead a traversal begins from a rb-tree node we split. <snip> De-allocation. ~O(log(N)) complexity. An area is not inserted straight away to the tree/list, instead we identify the spot first, checking if it can be merged around neighbors. The list provides O(1) access to prev/next, so it is pretty fast to check it. Summarizing. If merged then large coalesced areas are created, if not the area is just linked making more fragments. There is one more thing that i should mention here. After modification of VA node, its subtree_max_size is updated if it was/is the biggest area in its left or right sub-tree. Apart of that it can also be populated back to upper levels to fix the tree. For more details please have a look at the __augment_tree_propagate_from() function and the description. Tests and stressing ------------------- I use the "test_vmalloc.sh" test driver available under "tools/testing/selftests/vm/" since 5.1-rc1 kernel. Just trigger "sudo ./test_vmalloc.sh" to find out how to deal with it. Tested on different platforms including x86_64/i686/ARM64/x86_64_NUMA. Regarding last one, i do not have any physical access to NUMA system, therefore i emulated it. The time of stressing is days. If you run the test driver in "stress mode", you also need the patch that is in Andrew's tree but not in Linux 5.1-rc1. So, please apply it: http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=e0cf7749bade6da318e98e934a24d8b62fab512c After massive testing, i have not identified any problems like memory leaks, crashes or kernel panics. I find it stable, but more testing would be good. Performance analysis -------------------- I have used two systems to test. One is i5-3320M CPU @ 2.60GHz and another is HiKey960(arm64) board. i5-3320M runs on 4.20 kernel, whereas Hikey960 uses 4.15 kernel. I have both system which could run on 5.1-rc1 as well, but the results have not been ready by time i an writing this. Currently it consist of 8 tests. There are three of them which correspond to different types of splitting(to compare with default). We have 3 ones(see above). Another 5 do allocations in different conditions. a) sudo ./test_vmalloc.sh performance When the test driver is run in "performance" mode, it runs all available tests pinned to first online CPU with sequential execution test order. We do it in order to get stable and repeatable results. Take a look at time difference in "long_busy_list_alloc_test". It is not surprising because the worst case is O(N). # i5-3320M How many cycles all tests took: CPU0=646919905370(default) cycles vs CPU0=193290498550(patched) cycles # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_patched.txt # Hikey960 8x CPUs How many cycles all tests took: CPU0=3478683207 cycles vs CPU0=463767978 cycles # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_patched.txt b) time sudo ./test_vmalloc.sh test_repeat_count=1 With this configuration, all tests are run on all available online CPUs. Before running each CPU shuffles its tests execution order. It gives random allocation behaviour. So it is rough comparison, but it puts in the picture for sure. # i5-3320M <default> vs <patched> real 101m22.813s real 0m56.805s user 0m0.011s user 0m0.015s sys 0m5.076s sys 0m0.023s # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_patched.txt # Hikey960 8x CPUs <default> vs <patched> real unknown real 4m25.214s user unknown user 0m0.011s sys unknown sys 0m0.670s I did not manage to complete this test on "default Hikey960" kernel version. After 24 hours it was still running, therefore i had to cancel it. That is why real/user/sys are "unknown". This patch (of 3): Currently an allocation of the new vmap area is done over busy list iteration(complexity O(n)) until a suitable hole is found between two busy areas. Therefore each new allocation causes the list being grown. Due to over fragmented list and different permissive parameters an allocation can take a long time. For example on embedded devices it is milliseconds. This patch organizes the KVA memory layout into free areas of the 1-ULONG_MAX range. It uses an augment red-black tree that keeps blocks sorted by their offsets in pair with linked list keeping the free space in order of increasing addresses. Nodes are augmented with the size of the maximum available free block in its left or right sub-tree. Thus, that allows to take a decision and traversal toward the block that will fit and will have the lowest start address, i.e. it is sequential allocation. Allocation: to allocate a new block a search is done over the tree until a suitable lowest(left most) block is large enough to encompass: the requested size, alignment and vstart point. If the block is bigger than requested size - it is split. De-allocation: when a busy vmap area is freed it can either be merged or inserted to the tree. Red-black tree allows efficiently find a spot whereas a linked list provides a constant-time access to previous and next blocks to check if merging can be done. In case of merging of de-allocated memory chunk a large coalesced area is created. Complexity: ~O(log(N)) [urezki@gmail.com: v3] Link: http://lkml.kernel.org/r/20190402162531.10888-2-urezki@gmail.com [urezki@gmail.com: v4] Link: http://lkml.kernel.org/r/20190406183508.25273-2-urezki@gmail.com Link: http://lkml.kernel.org/r/20190321190327.11813-2-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'i2c/for-next' of ↵Linus Torvalds2019-05-191-0/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "Some I2C core API additions which are kind of simple but enhance error checking for users a lot, especially by returning errno now. There are wrappers to still support the old API but it will be removed once all users are converted" * 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: core: add device-managed version of i2c_new_dummy i2c: core: improve return value handling of i2c_new_device and i2c_new_dummy
| * | i2c: core: add device-managed version of i2c_new_dummyHeiner Kallweit2019-05-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i2c_new_dummy is typically called from the probe function of the driver for the primary i2c client. It requires calls to i2c_unregister_device in the error path of the probe function and in the remove function. This can be simplified by introducing a device-managed version. Note the changed error case return value type: i2c_new_dummy returns NULL whilst devm_i2c_new_dummy_device returns an ERR_PTR. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> [wsa: rename new functions and fix minor kdoc issues] Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Peter Rosin <peda@axentia.se> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
* | | Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2019-05-191-3/+5
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Some bug fixes, and an update to the URL's for the final version of Unicode 12.1.0" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid panic during forced reboot due to aborted journal ext4: fix block validity checks for journal inodes using indirect blocks unicode: update to Unicode 12.1.0 final unicode: add missing check for an error return from utf8lookup() ext4: fix miscellaneous sparse warnings ext4: unsigned int compared against zero ext4: fix use-after-free in dx_release() ext4: fix data corruption caused by overlapping unaligned and aligned IO jbd2: fix potential double free ext4: zero out the unused memory region in the extent tree block
| * | | jbd2: fix potential double freeChengguang Xu2019-05-101-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When failing from creating cache jbd2_inode_cache, we will destroy the previously created cache jbd2_handle_cache twice. This patch fixes this by moving each cache initialization/destruction to its own separate, individual function. Signed-off-by: Chengguang Xu <cgxu519@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
* | | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2019-05-191-2/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull clocksource updates from Ingo Molnar: "Misc clocksource/clockevent driver updates that came in a bit late but are ready for v5.2" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: misc: atmel_tclib: Do not probe already used TCBs clocksource/drivers/timer-atmel-tcb: Convert tc_clksrc_suspend|resume() to static clocksource/drivers/tcb_clksrc: Rename the file for consistency clocksource/drivers/timer-atmel-pit: Rework Kconfig option clocksource/drivers/tcb_clksrc: Move Kconfig option ARM: at91: Implement clocksource selection clocksource/drivers/tcb_clksrc: Use tcb as sched_clock clocksource/drivers/tcb_clksrc: Stop depending on atmel_tclib ARM: at91: move SoC specific definitions to SoC folder clocksource/drivers/timer-milbeaut: Cleanup common register accesses clocksource/drivers/timer-milbeaut: Add shutdown function clocksource/drivers/timer-milbeaut: Fix to enable one-shot timer clocksource/drivers/tegra: Rework for compensation of suspend time clocksource/drivers/sp804: Add COMPILE_TEST to CONFIG_ARM_TIMER_SP804 clocksource/drivers/sun4i: Add a compatible for suniv dt-bindings: timer: Add Allwinner suniv timer
| * | | | ARM: at91: move SoC specific definitions to SoC folderAlexandre Belloni2019-05-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move linux/atmel_tc.h to the SoC specific folder include/soc/at91. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-by: Thierry Reding <thierry.reding@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
* | | | | Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2019-05-197-8/+214
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull IRQ chip updates from Ingo Molnar: "A late irqchips update: - New TI INTR/INTA set of drivers - Rewrite of the stm32mp1-exti driver as a platform driver - Update the IOMMU MSI mapping API to be RT friendly - A number of cleanups and other low impact fixes" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) iommu/dma-iommu: Remove iommu_dma_map_msi_msg() irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg() irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg() irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg() irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg() iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts genirq/msi: Add a new field in msi_desc to store an IOMMU cookie arm64: arch_k3: Enable interrupt controller drivers irqchip/ti-sci-inta: Add msi domain support soc: ti: Add MSI domain bus support for Interrupt Aggregator irqchip/ti-sci-inta: Add support for Interrupt Aggregator driver dt-bindings: irqchip: Introduce TISCI Interrupt Aggregator bindings irqchip/ti-sci-intr: Add support for Interrupt Router driver dt-bindings: irqchip: Introduce TISCI Interrupt router bindings gpio: thunderx: Use the default parent apis for {request,release}_resources genirq: Introduce irq_chip_{request,release}_resource_parent() apis firmware: ti_sci: Add helper apis to manage resources firmware: ti_sci: Add RM mapping table for am654 firmware: ti_sci: Add support for IRQ management firmware: ti_sci: Add support for RM core ops ...
| * | | | | iommu/dma-iommu: Remove iommu_dma_map_msi_msg()Julien Grall2019-05-031-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent change split iommu_dma_map_msi_msg() in two new functions. The function was still implemented to avoid modifying all the callers at once. Now that all the callers have been reworked, iommu_dma_map_msi_msg() can be removed. Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | | | | iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two partsJulien Grall2019-05-031-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On RT, iommu_dma_map_msi_msg() may be called from non-preemptible context. This will lead to a splat with CONFIG_DEBUG_ATOMIC_SLEEP as the function is using spin_lock (they can sleep on RT). iommu_dma_map_msi_msg() is used to map the MSI page in the IOMMU PT and update the MSI message with the IOVA. Only the part to lookup for the MSI page requires to be called in preemptible context. As the MSI page cannot change over the lifecycle of the MSI interrupt, the lookup can be cached and re-used later on. iomma_dma_map_msi_msg() is now split in two functions: - iommu_dma_prepare_msi(): This function will prepare the mapping in the IOMMU and store the cookie in the structure msi_desc. This function should be called in preemptible context. - iommu_dma_compose_msi_msg(): This function will update the MSI message with the IOVA when the device is behind an IOMMU. Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | | | | genirq/msi: Add a new field in msi_desc to store an IOMMU cookieJulien Grall2019-05-031-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an MSI doorbell is located downstream of an IOMMU, it is required to swizzle the physical address with an appropriately-mapped IOVA for any device attached to one of our DMA ops domain. At the moment, the allocation of the mapping may be done when composing the message. However, the composing may be done in non-preemtible context while the allocation requires to be called from preemptible context. A follow-up change will split the current logic in two functions requiring to keep an IOMMU cookie per MSI. A new field is introduced in msi_desc to store an IOMMU cookie. As the cookie may not be required in some configuration, the field is protected under a new config CONFIG_IRQ_MSI_IOMMU. A pair of helpers has also been introduced to access the field. Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | | | | soc: ti: Add MSI domain bus support for Interrupt AggregatorLokesh Vutla2019-05-013-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the system coprocessor managing the range allocation of the inputs to Interrupt Aggregator, it is difficult to represent the device IRQs from DT. The suggestion is to use MSI in such cases where devices wants to allocate and group interrupts dynamically. Create a MSI domain bus layer that allocates and frees MSIs for a device. APIs that are implemented: - ti_sci_inta_msi_create_irq_domain() that creates a MSI domain - ti_sci_inta_msi_domain_alloc_irqs() that creates MSIs for the specified device and resource. - ti_sci_inta_msi_domain_free_irqs() frees the irqs attached to the device. - ti_sci_inta_msi_get_virq() for getting the virq attached to a specific event. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | | | | genirq: Introduce irq_chip_{request,release}_resource_parent() apisLokesh Vutla2019-05-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce irq_chip_{request,release}_resource_parent() apis so that these can be used in hierarchical irqchips. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | | | | firmware: ti_sci: Add helper apis to manage resourcesLokesh Vutla2019-05-011-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each resource with in the device can be uniquely identified as defined by TISCI. Since this is generic across the devices, resource allocation also can be made generic instead of each client driver handling the resource. So add helper apis to manage the resource. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | | | | firmware: ti_sci: Add support for IRQ managementLokesh Vutla2019-05-011-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TISCI abstracts the handling of IRQ routes where interrupt sources are not directly connected to host interrupt controller. Add support for the set of TISCI commands for requesting and releasing IRQs. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | | | | firmware: ti_sci: Add support for RM core opsLokesh Vutla2019-05-011-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TISCI provides support for getting the resources(IRQ, RING etc..) assigned to a specific device. These resources can be handled by the client and in turn sends TISCI cmd to configure the resources. It is very important that client should keep track on usage of these resources. Add support for TISCI commands to get resource ranges. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | | | | firmware: ti_sci: Add support to get TISCI handle using of_phandleGrygorii Strashko2019-05-011-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TISCI has been updated to have support for Resource management(like interrupts etc..). And there can be multiple device instances of a resource type in a SoC. So every driver corresponding to a resource type should get a TISCI handle so that it can make TISCI calls. And each DT node corresponding to a device should exist under its corresponding bus node as per the SoC architecture. But existing apis in TISCI library assumes that all TISCI users are child nodes of TISCI. Which is not true in the above case. So introduce (devm_)ti_sci_get_by_phandle() apis that can be used by TISCI users to get TISCI handle using of phandle property. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | | | | irqchip/gic-v3-its: fix some definitions of inner cacheability attributesHongbo Yao2019-04-291-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some definitions of Inner Cacheability attibutes need to be corrected. Fixes: 8c828a535e29f ("irqchip/gicv3-its: Restore all cacheability attributes") Signed-off-by: Hongbo Yao <yaohongbo@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* | | | | | Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds2019-05-191-1/+0
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: "This fixes a particularly thorny munmap() bug with MPX, plus fixes a host build environment assumption in objtool" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Allow AR to be overridden with HOSTAR x86/mpx, mm/core: Fix recursive munmap() corruption
| * | | | | | x86/mpx, mm/core: Fix recursive munmap() corruptionDave Hansen2019-05-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a bit of a mess, to put it mildly. But, it's a bug that only seems to have showed up in 4.20 but wasn't noticed until now, because nobody uses MPX. MPX has the arch_unmap() hook inside of munmap() because MPX uses bounds tables that protect other areas of memory. When memory is unmapped, there is also a need to unmap the MPX bounds tables. Barring this, unused bounds tables can eat 80% of the address space. But, the recursive do_munmap() that gets called vi arch_unmap() wreaks havoc with __do_munmap()'s state. It can result in freeing populated page tables, accessing bogus VMA state, double-freed VMAs and more. See the "long story" further below for the gory details. To fix this, call arch_unmap() before __do_unmap() has a chance to do anything meaningful. Also, remove the 'vma' argument and force the MPX code to do its own, independent VMA lookup. == UML / unicore32 impact == Remove unused 'vma' argument to arch_unmap(). No functional change. I compile tested this on UML but not unicore32. == powerpc impact == powerpc uses arch_unmap() well to watch for munmap() on the VDSO and zeroes out 'current->mm->context.vdso_base'. Moving arch_unmap() makes this happen earlier in __do_munmap(). But, 'vdso_base' seems to only be used in perf and in the signal delivery that happens near the return to userspace. I can not find any likely impact to powerpc, other than the zeroing happening a little earlier. powerpc does not use the 'vma' argument and is unaffected by its removal. I compile-tested a 64-bit powerpc defconfig. == x86 impact == For the common success case this is functionally identical to what was there before. For the munmap() failure case, it's possible that some MPX tables will be zapped for memory that continues to be in use. But, this is an extraordinarily unlikely scenario and the harm would be that MPX provides no protection since the bounds table got reset (zeroed). I can't imagine anyone doing this: ptr = mmap(); // use ptr ret = munmap(ptr); if (ret) // oh, there was an error, I'll // keep using ptr. Because if you're doing munmap(), you are *done* with the memory. There's probably no good data in there _anyway_. This passes the original reproducer from Richard Biener as well as the existing mpx selftests/. The long story: munmap() has a couple of pieces: 1. Find the affected VMA(s) 2. Split the start/end one(s) if neceesary 3. Pull the VMAs out of the rbtree 4. Actually zap the memory via unmap_region(), including freeing page tables (or queueing them to be freed). 5. Fix up some of the accounting (like fput()) and actually free the VMA itself. This specific ordering was actually introduced by: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") during the 4.20 merge window. The previous __do_munmap() code was actually safe because the only thing after arch_unmap() was remove_vma_list(). arch_unmap() could not see 'vma' in the rbtree because it was detached, so it is not even capable of doing operations unsafe for remove_vma_list()'s use of 'vma'. Richard Biener reported a test that shows this in dmesg: [1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551 [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576 What triggered this was the recursive do_munmap() called via arch_unmap(). It was freeing page tables that has not been properly zapped. But, the problem was bigger than this. For one, arch_unmap() can free VMAs. But, the calling __do_munmap() has variables that *point* to VMAs and obviously can't handle them just getting freed while the pointer is still in use. I tried a couple of things here. First, I tried to fix the page table freeing problem in isolation, but I then found the VMA issue. I also tried having the MPX code return a flag if it modified the rbtree which would force __do_munmap() to re-walk to restart. That spiralled out of control in complexity pretty fast. Just moving arch_unmap() and accepting that the bonkers failure case might eat some bounds tables seems like the simplest viable fix. This was also reported in the following kernel bugzilla entry: https://bugzilla.kernel.org/show_bug.cgi?id=203123 There are some reports that this commit triggered this bug: dd2283f2605 ("mm: mmap: zap pages with read mmap_sem in munmap") While that commit certainly made the issues easier to hit, I believe the fundamental issue has been with us as long as MPX itself, thus the Fixes: tag below is for one of the original MPX commits. [ mingo: Minor edits to the changelog and the patch. ] Reported-by: Richard Biener <rguenther@suse.de> Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-um@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: stable@vger.kernel.org Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds2019-05-191-0/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ARM SoC late updates from Olof Johansson: "This is some material that we picked up into our tree late. Most of it are smaller fixes and additions, some defconfig updates due to recent development, etc. Code-wise the largest portion is a series of PM updates for the at91 platform, and those have been in linux-next a while through the at91 tree before we picked them up" * tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits) arm64: dts: sprd: Add clock properties for serial devices Opt out of scripts/get_maintainer.pl ARM: ixp4xx: Remove duplicated include from common.c soc: ixp4xx: qmgr: Fix an NULL vs IS_ERR() check in probe arm64: tegra: Disable XUSB support on Jetson TX2 arm64: tegra: Enable SMMU translation for PCI on Tegra186 arm64: tegra: Fix insecure SMMU users for Tegra186 arm64: tegra: Select ARM_GIC_PM amba: tegra-ahb: Mark PM functions as __maybe_unused ARM: dts: logicpd-som-lv: Fix MMC1 card detect ARM: mvebu: drop return from void function ARM: mvebu: prefix coprocessor operand with p ARM: mvebu: drop unnecessary label ARM: mvebu: fix a leaked reference by adding missing of_node_put ARM: socfpga_defconfig: enable LTC2497 ARM: mvebu: kirkwood: remove error message when retrieving mac address ARM: at91: sama5: make ov2640 as a module ARM: OMAP1: ams-delta: fix early boot crash when LED support is disabled ARM: at91: remove HAVE_FB_ATMEL for sama5 SoC as they use DRM soc/fsl/qe: Fix an error code in qe_pin_request() ...