summaryrefslogtreecommitdiffstats
path: root/mm/damon
Commit message (Collapse)AuthorAgeFilesLines
* mm/damon/vaddr: change asm-generic/mman-common.h to linux/mman.hTanzir Hasan2023-12-291-1/+1
| | | | | | | | | | | | | asm-generic/mman-common.h can be replaced by linux/mman.h and the file will still build correctly. It is an asm-generic file which should be avoided if possible. Link: https://lkml.kernel.org/r/20231221-asmgenericvaddr-v1-1-742b170c914e@google.com Fixes: 6dea8add4d28 ("mm/damon/vaddr: support DAMON-based Operation Schemes") Signed-off-by: Tanzir Hasan <tanzirh@google.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/damon/core-test: test max_nr_accesses overflow caused divide-by-zeroSeongJae Park2023-12-201-0/+11
| | | | | | | | | | | | Commit 35f5d94187a6 ("mm/damon: implement a function for max nr_accesses safe calculation") has fixed an overflow bug that could cause divide-by-zero. Add a kunit test for the bug to ensure similar bugs are not introduced again. Link: https://lkml.kernel.org/r/20231213190338.54146-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/damon: update email of SeongJaeSeongJae Park2023-12-207-7/+7
| | | | | | | | | | | | | | | | | | Patch series "mm/damon: misc updates for 6.8". Update comments, tests, and documents for DAMON. This patch (of 6): SeongJae is using his kernel.org account for DAMON development. Update the old email addresses on the comments of DAMON source files. Link: https://lkml.kernel.org/r/20231213190338.54146-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231213190338.54146-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* sync mm-stable with mm-hotfixes-stable to pick up depended-upon changesAndrew Morton2023-12-201-0/+6
|\
| * mm/damon/core: make damon_start() waits until kdamond_fn() startsSeongJae Park2023-12-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cleanup tasks of kdamond threads including reset of corresponding DAMON context's ->kdamond field and decrease of global nr_running_ctxs counter is supposed to be executed by kdamond_fn(). However, commit 0f91d13366a4 ("mm/damon: simplify stop mechanism") made neither damon_start() nor damon_stop() ensure the corresponding kdamond has started the execution of kdamond_fn(). As a result, the cleanup can be skipped if damon_stop() is called fast enough after the previous damon_start(). Especially the skipped reset of ->kdamond could cause a use-after-free. Fix it by waiting for start of kdamond_fn() execution from damon_start(). Link: https://lkml.kernel.org/r/20231208175018.63880-1-sj@kernel.org Fixes: 0f91d13366a4 ("mm/damon: simplify stop mechanism") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Jakub Acs <acsjakub@amazon.de> Cc: Changbin Du <changbin.du@intel.com> Cc: Jakub Acs <acsjakub@amazon.de> Cc: <stable@vger.kernel.org> # 5.15.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core-test: add a unit test for the feedback loop algorithmSeongJae Park2023-12-121-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement a simple kunit test for testing the behavior of the feedback loop algorithm for the aim-oriented feedback-friven DAMOS aggressiveness auto tuning. Link: https://lkml.kernel.org/r/20231130023652.50284-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/sysfs-schemes: implement a command for scheme quota goals only commitSeongJae Park2023-12-123-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To update DAMOS quota goals, users need to enter 'commit' command to the 'state' file of the kdamond, which applies not only the goals but entire inputs. It is inefficient. Implement yet another 'state' file input command for reading and committing only the scheme quota goals, namely 'commit_schemes_quota_goals'. Link: https://lkml.kernel.org/r/20231130023652.50284-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/sysfs-schemes: commit damos quota goals user input to DAMOSSeongJae Park2023-12-121-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make DAMON sysfs interface to read the user inputs for DAMOS quota goals and pass those to DAMOS, so that the users can use the quota auto-tuning feature. It uses the DAMON sysfs interface's user input commit mechanism, which applies all user inputs for initial starting of DAMON and online input updates, which can be done by writing 'on' and 'commit' to the kdamond's 'state' file, respectively. In other words, the user should periodically write appropriate value to 'current_value' files and 'commit' command to the 'state' file. 'target_value' files could also be similarly updated at any time. Note that the interface is supporting multiple goals while the core logic supports only one goal. DAMON sysfs interface passes only best feedback among the given inputs, to avoid making DAMOS too aggressive. Link: https://lkml.kernel.org/r/20231130023652.50284-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/sysfs-schemes: implement files for scheme quota goals setupSeongJae Park2023-12-121-3/+221
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement DAMON sysfs directories and files for the goals of DAMOS quota. Those allow users set multiple goals for their aim, with target values. Users can further enter the current score value for each goal as feedback for DAMOS. Note that this commit is implementing only the basic file operations, and not connecting the files with the DAMOS core logic. Hence writing something to the files makes no real effect. The following commit will connect the file operations and the core logic. Link: https://lkml.kernel.org/r/20231130023652.50284-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: implement goal-oriented feedback-driven quota auto-tuningSeongJae Park2023-12-121-9/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm/damon: let users feed and tame/auto-tune DAMOS". Introduce Aim-oriented Feedback-driven DAMOS Aggressiveness Auto-tuning. It makes DAMOS self-tuned with periodic simple user feedback. Background: DAMOS Control Difficulty ==================================== DAMOS helps users easily implement access pattern aware system operations. However, controlling DAMOS in the wild is not that easy. The basic way for DAMOS control is specifying the target access pattern. In this approach, the user is assumed to well understand the access pattern and the characteristics of the system and the workloads. Though there are useful tools for that, it takes time and effort depending on the complexity and the dynamicity of the system and the workloads. After all, the access pattern consists of three ranges, namely the size, the access rate, and the age of the regions. It means users need to tune six parameters, which is anyway not a simple task. One of the worst cases would be DAMOS being too aggressive like a berserker, and therefore consuming too much system resource and making unwanted radical system operations. To let users avoid such cases, DAMOS allows users to set the upper-limit of the schemes' aggressiveness, namely DAMOS quota. DAMOS further provides its best-effort under the limit by prioritizing regions based on the access pattern of the regions. For example, users can ask DAMOS to page out up to 100 MiB of memory regions per second. Then DAMOS pages out regions that are not accessed for a longer time (colder) first under the limit. This allows users to set the target access pattern a bit naive with wider ranges, and focus on tuning only one parameter, the quota. In other words, the number of parameters to tune can be reduced from six to one. Still, however, the optimum value for the quota depends on the system and the workloads' characteristics, so not that simple. The number of parameters to tune can also increase again if the user needs to run multiple schemes. Aim-oriented Feedback-driven DAMOS Aggressiveness Auto Tuning ============================================================= Users would use DAMOS since they want to achieve something with it. They will likely have measurable metrics representing the achievement and the target number of the metric like SLO, and continuously measure that anyway. While the additional cost of getting the information is nearly zero, it could be useful for DAMOS to understand how appropriate its current aggressiveness is set, and adjust it on its own to make the metric value more close to the target. Based on this idea, we introduce a new way of tuning DAMOS with nearly zero additional effort, namely Aim-oriented Feedback-driven DAMOS Aggressiveness Auto Tuning. It asks users to provide feedback representing how well DAMOS is doing relative to the users' aim. Then DAMOS adjusts its aggressiveness, specifically the quota that provides the best effort result under the limit, based on the current level of the aggressiveness and the users' feedback. Implementation ============== The implementation asks users to represent the feedback with score numbers. The scores could be anything including user-space specific metrics including latency and throughput of special user-space workloads, and system metrics including free memory ratio, memory pressure stall time (PSI), and active to inactive LRU lists size ratio. The feedback scores and the aggressiveness of the given DAMOS scheme are assumed to be positively proportional, though. Selecting metrics of the assumption is the users' responsibility. The core logic uses the below simple feedback loop algorithm to calculate the next aggressiveness level of the scheme from the current aggressiveness level and the current feedback (target_score and current_score). It calculates the compensation for next aggressiveness as a proportion of current aggressiveness and distance to the target score. As a result, it arrives at the near-goal state in a short time using big steps when it's far from the goal, but avoids making unnecessarily radical changes that could turn out to be a bad decision using small steps when its near to the goal. f(n) = max(1, f(n - 1) * ((target_score - current_score) / target_score + 1)) Note that the compensation value becomes negative when it's over achieving the goal. That's why the feedback metric and the aggressiveness of the scheme should be positively proportional. The distance-adaptive speed manipulation is simply applied. Example Use Cases ================= If users want to reduce the memory footprint of the system as much as possible as long as the time spent for handling the resulting memory pressure is within a threshold, they could use DAMOS scheme that reclaims cold memory regions aiming for a little level of memory pressure stall time. If users want the active/inactive LRU lists well balanced to reduce the performance impact due to possible future memory pressure, they could use two schemes. The first one would be set to locate hot pages in the active LRU list, aiming for a specific active-to-inactive LRU list size ratio, say, 70%. The second one would be to locate cold pages in the inactive LRU list, aiming for a specific inactive-to-active LRU list size ratio, say, 30%. Then, DAMOS will balance the two schemes based on the goal and feedback. This aim-oriented auto tuning could also be useful for general balancing-required access aware system operations such as system memory auto scaling[3] and tiered memory management[4]. These two example usages are not what current DAMOS implementation is already supporting, but require additional DAMOS action developments, though. Evaluation: subtle memory pressure aiming proactive reclamation =============================================================== To show if the implementation works as expected, we prepare four different system configurations on AWS i3.metal instances. The first setup (original) runs the workload without any DAMOS scheme. The second setup (not-tuned) runs the workload with a virtual address space-based proactive reclamation scheme that pages out memory regions that are not accessed for five seconds or more. The third setup (offline-tuned) runs the same proactive reclamation DAMOS scheme, but after making it tuned for each workload offline, using our previous user-space driven automatic tuning approach, namely DAMOOS[1]. The fourth and final setup (AFDAA) runs the scheme that is the same as that of 'not-tuned' setup, but aims to keep 0.5% of 'some' memory pressure stall time (PSI) for the last 10 seconds using the aiming-oriented auto tuning. For each setup, we run realistic workloads from PARSEC3 and SPLASH-2X benchmark suites. For each run, we measure RSS and runtime of the workload, and 'some' memory pressure stall time (PSI) of the system. We repeat the runs five times and use averaged measurements. For simple comparison of the results, we normalize the measurements to those of 'original'. In the case of the PSI, though, the measurement for 'original' was zero, so we normalize the value to that of 'not-tuned' scheme's result. The normalized results are shown below. Not-tuned Offline-tuned AFDAA RSS 0.622688178226118 0.787950678944904 0.740093483278979 runtime 1.11767826657912 1.0564674983585 1.0910833880499 PSI 1 0.727521443794069 0.308498846350299 The 'not-tuned' scheme achieves about 38.7% memory saving but incur about 11.7% runtime slowdown. The 'offline-tuned' scheme achieves about 22.2% memory saving with about 5.5% runtime slowdown. It also achieves about 28.2% memory pressure stall time saving. AFDAA achieves about 26% memory saving with about 9.1% runtime slowdown. It also achieves about 69.1% memory pressure stall time saving. We repeat this test multiple times, and get consistent results. AFDAA is now integrated in our daily DAMON performance test setup. Apparently the aggressiveness of 'AFDAA' setup is somewhere between those of 'not-tuned' and 'offline-tuned' setup, since its memory saving and runtime overhead are between those of the other two setups. Actually we set the memory pressure stall time goal aiming for this middle aggressiveness. The difference in the two metrics are not significant, though. However, it shows significant saving of the memory pressure stall time, which was the goal of the auto-tuning, over the two variants. Hence, we conclude the automatic tuning is working as expected. Please note that the AFDAA setup is only for the evaluation, and therefore intentionally set a bit aggressive. It might not be appropriate for production environments. The test code is also available[2], so you could reproduce it on your system and workloads. Patches Sequence ================ The first four patches implement the core logic and user interfaces for the auto tuning. The first patch implements the core logic for the auto tuning, and the API for DAMOS users in the kernel space. The second patch implements basic file operations of DAMON sysfs directories and files that will be used for setting the goals and providing the feedback. The third patch connects the quota goals files inputs to the DAMOS core logic. Finally the fourth patch implements a dedicated DAMOS sysfs command for efficiently committing the quota goals feedback. Two patches for simple tests of the logic and interfaces follow. The fifth patch implements the core logic unit test. The sixth patch implements a selftest for the DAMON Sysfs interface for the goals. Finally, three patches for documentation follows. The seventh patch documents the design of the feature. The eighth patch updates the API doc for the new sysfs files. The final eighth patch updates the usage document for the features. References ========== [1] DAOS paper: https://www.amazon.science/publications/daos-data-access-aware-operating-system [2] Evaluation code: https://github.com/damonitor/damon-tests/commit/3f884e61193f0166b8724554b6d06b0c449a712d [3] Memory auto scaling RFC idea: https://lore.kernel.org/damon/20231112195114.61474-1-sj@kernel.org/ [4] DAMON-based tiered memory management RFC idea: https://lore.kernel.org/damon/20231112195602.61525-1-sj@kernel.org/ This patch (of 9) Users can effectively control the upper-limit aggressiveness of DAMOS schemes using the quota feature. The quota provides best result under the limit by prioritizing regions based on the access pattern. That said, finding the best value, which could depend on dynamic characteristics of the system and the workloads, is still challenging. Implement a simple feedback-driven tuning mechanism and use it for automatic tuning of DAMOS quota. The implementation allows users to provide the feedback by setting a feedback score returning callback function. Then DAMOS periodically calls the function back and adjusts the quota based on the return value of the callback and current quota value. Note that the absolute-value based time/size quotas still work as the maximum hard limits of the scheme's aggressiveness. The feedback-driven auto-tuned quota is applied only if it is not exceeding the manually set maximum limits. Same for the scheme-target access pattern and filters like other features. [sj@kernel.org: document get_score_arg field of struct damos_quota] Link: https://lkml.kernel.org/r/20231204170106.60992-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231130023652.50284-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231130023652.50284-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core-test: test damon_split_region_at()'s access rate copyingSeongJae Park2023-12-101-4/+11
|/ | | | | | | | | | | | damon_split_region_at() should set access rate related fields of the resulting regions same. It may forgotten, and actually there was the mistake before. Test it with the unit test case for the function. Link: https://lkml.kernel.org/r/20231119171529.66863-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regionsSeongJae Park2023-12-061-6/+43
| | | | | | | | | | | | | If a scheme is set to not applied to any monitoring target region for any reasons including the target access pattern, quota, filters, or watermarks, writing 'update_schemes_tried_regions' to 'state' DAMON sysfs file can indefinitely hang. Fix the case by implementing a timeout for the operation. The time limit is two apply intervals of each scheme. Link: https://lkml.kernel.org/r/20231124213840.39157-1-sj@kernel.org Fixes: 4d4e41b68299 ("mm/damon/sysfs-schemes: do not update tried regions more than one DAMON snapshot") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/damon/core: copy nr_accesses when splitting regionSeongJae Park2023-12-061-0/+1
| | | | | | | | | | | | | | | | | | | Regions split function ('damon_split_region_at()') is called at the beginning of an aggregation interval, and when DAMOS applying the actions and charging quota. Because 'nr_accesses' fields of all regions are reset at the beginning of each aggregation interval, and DAMOS was applying the action at the end of each aggregation interval, there was no need to copy the 'nr_accesses' field to the split-out region. However, commit 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval") made DAMOS applies action on its own timing interval. Hence, 'nr_accesses' should also copied to split-out regions, but the commit didn't. Fix it by copying it. Link: https://lkml.kernel.org/r/20231119171529.66863-1-sj@kernel.org Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/damon/core.c: avoid unintentional filtering out of schemesHyeongtak Ji2023-11-151-1/+1
| | | | | | | | | | | | | | | | The function '__damos_filter_out()' causes DAMON to always filter out schemes whose filter type is anon or memcg if its matching value is set to false. This commit addresses the issue by ensuring that '__damos_filter_out()' no longer applies to filters whose type is 'anon' or 'memcg'. Link: https://lkml.kernel.org/r/1699594629-3816-1-git-send-email-hyeongtak.ji@gmail.com Fixes: ab9bda001b681 ("mm/damon/core: introduce address range type damos filter") Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/damon/sysfs-schemes: handle tried region directory allocation failureSeongJae Park2023-11-151-0/+2
| | | | | | | | | | | | | | DAMON sysfs interface's before_damos_apply callback (damon_sysfs_before_damos_apply()), which creates the DAMOS tried regions for each DAMOS action applied region, is not handling the allocation failure for the sysfs directory data. As a result, NULL pointer derefeence is possible. Fix it by handling the case. Link: https://lkml.kernel.org/r/20231106233408.51159-4-sj@kernel.org Fixes: f1d13cacabe1 ("mm/damon/sysfs: implement DAMOS tried regions update command") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/damon/sysfs-schemes: handle tried regions sysfs directory allocation failureSeongJae Park2023-11-151-0/+3
| | | | | | | | | | | | | DAMOS tried regions sysfs directory allocation function (damon_sysfs_scheme_regions_alloc()) is not handling the memory allocation failure. In the case, the code will dereference NULL pointer. Handle the failure to avoid such invalid access. Link: https://lkml.kernel.org/r/20231106233408.51159-3-sj@kernel.org Fixes: 9277d0367ba1 ("mm/damon/sysfs-schemes: implement scheme region directory") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/damon/sysfs: check error from damon_sysfs_update_target()SeongJae Park2023-11-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm/damon/sysfs: fix unhandled return values". Some of DAMON sysfs interface code is not handling return values from some functions. As a result, confusing user input handling or NULL-dereference is possible. Check those properly. This patch (of 3): damon_sysfs_update_target() returns error code for failures, but its caller, damon_sysfs_set_targets() is ignoring that. The update function seems making no critical change in case of such failures, but the behavior will look like DAMON sysfs is silently ignoring or only partially accepting the user input. Fix it. Link: https://lkml.kernel.org/r/20231106233408.51159-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231106233408.51159-2-sj@kernel.org Fixes: 19467a950b49 ("mm/damon/sysfs: remove requested targets when online-commit inputs") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [5.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* mm/damon/sysfs: eliminate potential uninitialized variable warningDan Carpenter2023-11-151-1/+1
| | | | | | | | | | | The "err" variable is not initialized if damon_target_has_pid(ctx) is false and sys_target->regions->nr is zero. Link: https://lkml.kernel.org/r/739e6aaf-a634-4e33-98a8-16546379ec9f@moroto.mountain Fixes: 0bcd216c4741 ("mm/damon/sysfs: update monitoring target regions for online input commit") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of ↵Linus Torvalds2023-11-021-2/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - 'kdump: use generic functions to simplify crashkernel reservation in arch', from Baoquan He. This is mainly cleanups and consolidation of the 'crashkernel=' kernel parameter handling - After much discussion, David Laight's 'minmax: Relax type checks in min() and max()' is here. Hopefully reduces some typecasting and the use of min_t() and max_t() - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.thread_group" * tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits) scripts/gdb/vmalloc: disable on no-MMU scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n .mailmap: add address mapping for Tomeu Vizoso mailmap: update email address for Claudiu Beznea tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions .mailmap: map Benjamin Poirier's address scripts/gdb: add lx_current support for riscv ocfs2: fix a spelling typo in comment proc: test ProtectionKey in proc-empty-vm test proc: fix proc-empty-vm test with vsyscall fs/proc/base.c: remove unneeded semicolon do_io_accounting: use sig->stats_lock do_io_accounting: use __for_each_thread() ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error() ocfs2: fix a typo in a comment scripts/show_delta: add __main__ judgement before main code treewide: mark stuff as __ro_after_init fs: ocfs2: check status values proc: test /proc/${pid}/statm compiler.h: move __is_constexpr() to compiler.h ...
| * kthread: add kthread_stop_putAndreas Gruenbacher2023-10-041-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a kthread_stop_put() helper that stops a thread and puts its task struct. Use it to replace the various instances of kthread_stop() followed by put_task_struct(). Remove the kthread_stop_put() macro in usbip that is similar but doesn't return the result of kthread_stop(). [agruenba@redhat.com: fix kerneldoc comment] Link: https://lkml.kernel.org/r/20230911111730.2565537-1-agruenba@redhat.com [akpm@linux-foundation.org: document kthread_stop_put()'s argument] Link: https://lkml.kernel.org/r/20230907234048.2499820-1-agruenba@redhat.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/sysfs: update monitoring target regions for online input commitSeongJae Park2023-11-011-17/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When user input is committed online, DAMON sysfs interface is ignoring the user input for the monitoring target regions. Such request is valid and useful for fixed monitoring target regions-based monitoring ops like 'paddr' or 'fvaddr'. Update the region boundaries as user specified, too. Note that the monitoring results of the regions that overlap between the latest monitoring target regions and the new target regions are preserved. Treat empty monitoring target regions user request as a request to just make no change to the monitoring target regions. Otherwise, users should set the monitoring target regions same to current one for every online input commit, and it could be challenging for dynamic monitoring target regions update DAMON ops like 'vaddr'. If the user really need to remove all monitoring target regions, they can simply remove the target and then create the target again with empty target regions. Link: https://lkml.kernel.org/r/20231031170131.46972-1-sj@kernel.org Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [5.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/sysfs: remove requested targets when online-commit inputsSeongJae Park2023-11-011-34/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | damon_sysfs_set_targets(), which updates the targets of the context for online commitment, do not remove targets that removed from the corresponding sysfs files. As a result, more than intended targets of the context can exist and hence consume memory and monitoring CPU resource more than expected. Fix it by removing all targets of the context and fill up again using the user input. This could cause unnecessary memory dealloc and realloc operations, but this is not a hot code path. Also, note that damon_target is stateless, and hence no data is lost. [sj@kernel.org: fix unnecessary monitoring results removal] Link: https://lkml.kernel.org/r/20231028213353.45397-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231022210735.46409-2-sj@kernel.org Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: <stable@vger.kernel.org> [5.19.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()SeongJae Park2023-10-253-0/+100
| | | | | | | | | | | | | | | | | | | | | | damon_sysfs_set_targets() had a bug that can result in unexpected memory usage and monitoring overhead increase. The bug has fixed by a previous commit. Add a unit test for avoiding a similar bug of future. Link: https://lkml.kernel.org/r/20231022210735.46409-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: avoid divide-by-zero from pseudo-moving window length calculationSeongJae Park2023-10-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When calculating the pseudo-moving access rate, DAMON divides some values by the maximum nr_accesses. However, due to the type of the related variables, simple division-based calculation of the divisor can return zero. As a result, divide-by-zero is possible. Fix it by using damon_max_nr_accesses(), which handles the case. Note that this is a fix for a commit that not in the mainline but mm tree. Link: https://lkml.kernel.org/r/20231019194924.100347-6-sj@kernel.org Fixes: ace30fb21af5 ("mm/damon/core: use pseudo-moving sum for nr_accesses_bp") Reported-by: Jakub Acs <acsjakub@amazon.de> Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/lru_sort: avoid divide-by-zero in hot threshold calculationSeongJae Park2023-10-251-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When calculating the hotness threshold for lru_prio scheme of DAMON_LRU_SORT, the module divides some values by the maximum nr_accesses. However, due to the type of the related variables, simple division-based calculation of the divisor can return zero. As a result, divide-by-zero is possible. Fix it by using damon_max_nr_accesses(), which handles the case. Link: https://lkml.kernel.org/r/20231019194924.100347-5-sj@kernel.org Fixes: 40e983cca927 ("mm/damon: introduce DAMON-based LRU-lists Sorting") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Jakub Acs <acsjakub@amazon.de> Cc: <stable@vger.kernel.org> [6.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/ops-common: avoid divide-by-zero during region hotness calculationSeongJae Park2023-10-251-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When calculating the hotness of each region for the under-quota regions prioritization, DAMON divides some values by the maximum nr_accesses. However, due to the type of the related variables, simple division-based calculation of the divisor can return zero. As a result, divide-by-zero is possible. Fix it by using damon_max_nr_accesses(), which handles the case. Link: https://lkml.kernel.org/r/20231019194924.100347-4-sj@kernel.org Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Jakub Acs <acsjakub@amazon.de> Cc: <stable@vger.kernel.org> [5.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: avoid divide-by-zero during monitoring results updateSeongJae Park2023-10-251-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When monitoring attributes are changed, DAMON updates access rate of the monitoring results accordingly. For that, it divides some values by the maximum nr_accesses. However, due to the type of the related variables, simple division-based calculation of the divisor can return zero. As a result, divide-by-zero is possible. Fix it by using damon_max_nr_accesses(), which handles the case. Link: https://lkml.kernel.org/r/20231019194924.100347-3-sj@kernel.org Fixes: 2f5bef5a590b ("mm/damon/core: update monitoring results for new monitoring attributes") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Jakub Acs <acsjakub@amazon.de> Cc: <stable@vger.kernel.org> [6.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/sysfs: avoid empty scheme tried regions for large apply intervalSeongJae Park2023-10-183-4/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DAMON_SYSFS assumes all schemes will be applied for at least one DAMON monitoring results snapshot within one aggregation interval, or makes no sense to wait for it while DAMON is deactivated by the watermarks. That for deactivated status still makes sense, but the aggregation interval based assumption is invalid now because each scheme can has its own apply interval. For schemes having larger than the aggregation or watermarks check interval, DAMOS tried regions update request can be finished without the update. Avoid the case by explicitly checking the status of the schemes tried regions update and watermarks based DAMON deactivation. Link: https://lkml.kernel.org/r/20231012192256.33556-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/sysfs-schemes: do not update tried regions more than one DAMON snapshotSeongJae Park2023-10-181-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval". DAMOS tried regions update feature of DAMON sysfs interface is doing the update for one aggregation interval after the request is made. Since the per-scheme apply interval is supported, that behavior makes no much sense. That is, the tried regions directory will have regions from multiple DAMON monitoring results snapshots, or no region for apply intervals that much shorter than, or longer than the aggregation interval, respectively. Update the behavior to update the regions for each scheme for only its apply interval, and update the document. Since DAMOS apply interval is the aggregation by default, this change makes no visible behavioral difference to old users who don't explicitly set the apply intervals. Patches Sequence ---------------- The first two patches makes schemes of apply intervals that much shorter or longer than the aggregation interval to keep the maximum and minimum times for continuing the update. After the two patches, the update aligns with the each scheme's apply interval. Finally, the third patch updates the document to reflect the behavior. This patch (of 3): DAMON_SYSFS exposes every DAMON-found region that eligible for applying the scheme action for one aggregation interval. However, each DAMON-based operation scheme has its own apply interval. Hence, for a scheme that having its apply interval much smaller than the aggregation interval, DAMON_SYSFS will expose the scheme regions that applied to more than one DAMON monitoring results snapshots. Since the purpose of DAMON tried regions is exposing single snapshot, this makes no much sense. Track progress of each scheme's tried regions update and avoid the case. Link: https://lkml.kernel.org/r/20231012192256.33556-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231012192256.33556-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.Andrew Morton2023-10-181-2/+5
|\ \
| * | mm/damon/sysfs: check DAMOS regions update progress from before_terminate()SeongJae Park2023-10-181-2/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DAMON_SYSFS can receive DAMOS tried regions update request while kdamond is already out of the main loop and before_terminate callback (damon_sysfs_before_terminate() in this case) is not yet called. And damon_sysfs_handle_cmd() can further be finished before the callback is invoked. Then, damon_sysfs_before_terminate() unlocks damon_sysfs_lock, which is not locked by anyone. This happens because the callback function assumes damon_sysfs_cmd_request_callback() should be called before it. Check if the assumption was true before doing the unlock, to avoid this problem. Link: https://lkml.kernel.org/r/20231007200432.3110-1-sj@kernel.org Fixes: f1d13cacabe1 ("mm/damon/sysfs: implement DAMOS tried regions update command") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.2.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: remove unnecessary si_meminfo invoke.Huan Yang2023-10-161-4/+2
| | | | | | | | | | | | | | | | | | | | | | si_meminfo() will read and assign more info not just free/ram pages. For just DAMOS_WMARK_FREE_MEM_RATE use, only get free and ram pages is ok to save cpu. Link: https://lkml.kernel.org/r/20230920015727.4482-1-link@vivo.com Signed-off-by: Huan Yang <link@vivo.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core-test: fix memory leak in damon_new_ctx()Jinjie Ruan2023-10-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_DAMON_KUNIT_TEST=y and making CONFIG_DEBUG_KMEMLEAK=y and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, the below memory leak is detected. The damon_ctx which is allocated by kzalloc() in damon_new_ctx() in damon_test_ops_registration() and damon_test_set_attrs() are not freed. So use damon_destroy_ctx() to free it. After applying this patch, the following memory leak is never detected unreferenced object 0xffff2b49c6968800 (size 512): comm "kunit_try_catch", pid 350, jiffies 4294895294 (age 557.028s) hex dump (first 32 bytes): 88 13 00 00 00 00 00 00 a0 86 01 00 00 00 00 00 ................ 00 87 93 03 00 00 00 00 0a 00 00 00 00 00 00 00 ................ backtrace: [<0000000088e71769>] slab_post_alloc_hook+0xb8/0x368 [<0000000073acab3b>] __kmem_cache_alloc_node+0x174/0x290 [<00000000b5f89cef>] kmalloc_trace+0x40/0x164 [<00000000eb19e83f>] damon_new_ctx+0x28/0xb4 [<00000000daf6227b>] damon_test_ops_registration+0x34/0x328 [<00000000559c4801>] kunit_try_run_case+0x50/0xac [<000000003932ed49>] kunit_generic_run_threadfn_adapter+0x20/0x2c [<000000003c3e9211>] kthread+0x124/0x130 [<0000000028f85bdd>] ret_from_fork+0x10/0x20 unreferenced object 0xffff2b49c1a9cc00 (size 512): comm "kunit_try_catch", pid 356, jiffies 4294895306 (age 557.000s) hex dump (first 32 bytes): 88 13 00 00 00 00 00 00 a0 86 01 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 0a 00 00 00 00 00 00 00 ................ backtrace: [<0000000088e71769>] slab_post_alloc_hook+0xb8/0x368 [<0000000073acab3b>] __kmem_cache_alloc_node+0x174/0x290 [<00000000b5f89cef>] kmalloc_trace+0x40/0x164 [<00000000eb19e83f>] damon_new_ctx+0x28/0xb4 [<00000000058495c4>] damon_test_set_attrs+0x30/0x1a8 [<00000000559c4801>] kunit_try_run_case+0x50/0xac [<000000003932ed49>] kunit_generic_run_threadfn_adapter+0x20/0x2c [<000000003c3e9211>] kthread+0x124/0x130 [<0000000028f85bdd>] ret_from_fork+0x10/0x20 Link: https://lkml.kernel.org/r/20230918120951.2230468-3-ruanjinjie@huawei.com Fixes: d1836a3b2a9a ("mm/damon/core-test: initialise context before test in damon_test_set_attrs()") Fixes: 4f540f5ab4f2 ("mm/damon/core-test: add a kunit test case for ops registration") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Feng Tang <feng.tang@intel.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core-test: fix memory leak in damon_new_region()Jinjie Ruan2023-10-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm/damon/core-test: Fix memory leaks in core-test", v3. There are a few memory leaks in core-test which are detected by kmemleak. This patchset fixes the issues. This patch (of 2): When CONFIG_DAMON_KUNIT_TEST=y and making CONFIG_DEBUG_KMEMLEAK=y and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, the below memory leak is detected. The damon_region which is allocated by kmem_cache_alloc() in damon_new_region() in damon_test_regions() and damon_test_update_monitoring_result() are not freed. So for damon_test_regions(), replace damon_del_region() call with damon_destroy_region() so that it calls both damon_del_region() and damon_free_region(), the latter will free the damon_region. For damon_test_update_monitoring_result(), call damon_free_region() to free it. After applying this patch, the following memory leak is never detected. unreferenced object 0xffff2b49c3edc000 (size 56): comm "kunit_try_catch", pid 338, jiffies 4294895280 (age 557.084s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 49 2b ff ff ............I+.. backtrace: [<0000000088e71769>] slab_post_alloc_hook+0xb8/0x368 [<00000000b528f67c>] kmem_cache_alloc+0x168/0x284 [<000000008603f022>] damon_new_region+0x28/0x54 [<00000000a3b8c64e>] damon_test_regions+0x38/0x270 [<00000000559c4801>] kunit_try_run_case+0x50/0xac [<000000003932ed49>] kunit_generic_run_threadfn_adapter+0x20/0x2c [<000000003c3e9211>] kthread+0x124/0x130 [<0000000028f85bdd>] ret_from_fork+0x10/0x20 unreferenced object 0xffff2b49c5b20000 (size 56): comm "kunit_try_catch", pid 354, jiffies 4294895304 (age 556.988s) hex dump (first 32 bytes): 03 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 96 00 00 00 49 2b ff ff ............I+.. backtrace: [<0000000088e71769>] slab_post_alloc_hook+0xb8/0x368 [<00000000b528f67c>] kmem_cache_alloc+0x168/0x284 [<000000008603f022>] damon_new_region+0x28/0x54 [<00000000ca019f80>] damon_test_update_monitoring_result+0x18/0x34 [<00000000559c4801>] kunit_try_run_case+0x50/0xac [<000000003932ed49>] kunit_generic_run_threadfn_adapter+0x20/0x2c [<000000003c3e9211>] kthread+0x124/0x130 [<0000000028f85bdd>] ret_from_fork+0x10/0x20 Link: https://lkml.kernel.org/r/20230918120951.2230468-1-ruanjinjie@huawei.com Link: https://lkml.kernel.org/r/20230918120951.2230468-2-ruanjinjie@huawei.com Fixes: 17ccae8bb5c9 ("mm/damon: add kunit tests") Fixes: f4c978b6594b ("mm/damon/core-test: add a test for damon_update_monitoring_results()") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Feng Tang <feng.tang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/sysfs-schemes: support DAMOS apply intervalSeongJae Park2023-10-041-4/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Update DAMON sysfs interface to support DAMOS apply intervals by adding a new file, 'apply_interval_us' in each scheme directory. Users can set and get the interval for each scheme in microseconds by writing to and reading from the file. Link: https://lkml.kernel.org/r/20230916020945.47296-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: implement scheme-specific apply intervalSeongJae Park2023-10-045-9/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DAMON-based operation schemes are applied for every aggregation interval. That was mainly because schemes were using nr_accesses, which be complete to be used for every aggregation interval. However, the schemes are now using nr_accesses_bp, which is updated for each sampling interval in a way that reasonable to be used. Therefore, there is no reason to apply schemes for each aggregation interval. The unnecessary alignment with aggregation interval was also making some use cases of DAMOS tricky. Quotas setting under long aggregation interval is one such example. Suppose the aggregation interval is ten seconds, and there is a scheme having CPU quota 100ms per 1s. The scheme will actually uses 100ms per ten seconds, since it cannobe be applied before next aggregation interval. The feature is working as intended, but the results might not that intuitive for some users. This could be fixed by updating the quota to 1s per 10s. But, in the case, the CPU usage of DAMOS could look like spikes, and would actually make a bad effect to other CPU-sensitive workloads. Implement a dedicated timing interval for each DAMON-based operation scheme, namely apply_interval. The interval will be sampling interval aligned, and each scheme will be applied for its apply_interval. The interval is set to 0 by default, and it means the scheme should use the aggregation interval instead. This avoids old users getting any behavioral difference. Link: https://lkml.kernel.org/r/20230916020945.47296-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/sysfs-schemes: use nr_accesses_bp as the source of ↵SeongJae Park2023-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tried_regions/<N>/nr_accesses DAMON sysfs interface exposes access rate of each region via DAMOS tried regions directory. For this, the nr_accesses field of the region is used. DAMOS was actually using nr_accesses in the past, but it uses nr_accesses_bp now. Use the value that it is really using as the source. Note that this doesn't expose nr_accesses_bp as is (in basis point), but after converting it to the natural number by dividing the value by 10,000. Hence there is no behavioral change from users' perspective. Link: https://lkml.kernel.org/r/20230916020945.47296-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: make DAMOS uses nr_accesses_bp instead of nr_accessesSeongJae Park2023-10-041-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm/damon: implement DAMOS apply intervals". DAMON-based operation schemes are applied for every aggregation interval. That is mainly because schemes are using nr_accesses, which be complete to be used for every aggregation interval. This makes some DAMOS use cases be tricky. Quota setting under long aggregation interval is one such example. Suppose the aggregation interval is ten seconds, and there is a scheme having CPU quota 100ms per 1s. The scheme will actually uses 100ms per ten seconds, since it cannobe be applied before next aggregation interval. The feature is working as intended, but the results might not that intuitive for some users. This could be fixed by updating the quota to 1s per 10s. But, in the case, the CPU usage of DAMOS could look like spikes, and actually make a bad effect to other CPU-sensitive workloads. Also, with such huge aggregation interval, users may want schemes to be applied more frequently. DAMON provides nr_accesses_bp, which is updated for each sampling interval in a way that reasonable to be used. By using that instead of nr_accesses, DAMOS can have its own time interval and mitigate abovely mentioned issues. This patchset makes DAMOS schemes to use nr_accesses_bp instead of nr_accesses, and have their own timing intervals. Also update DAMOS tried regions sysfs files and DAMOS before_apply tracepoint to use the new data as their source. Note that the interval is zero by default, and it is interpreted to use the aggregation interval instead. This avoids making user-visible behavioral changes. Patches Seuqeunce ----------------- The first patch (patch 1/9) makes DAMOS uses nr_accesses_bp instead of nr_accesses, and following two patches (patches 2/9 and 3/9) updates DAMON sysfs interface for DAMOS tried regions and the DAMOS before_apply tracespoint to use nr_accesses_bp instead of nr_accesses, respectively. The following two patches (patches 4/9 and 5/9) implements the scheme-specific apply interval for DAMON kernel API users and update the design document for the new feature. Finally, the following four patches (patches 6/9, 7/9, 8/9 and 9/9) add support of the feature in DAMON sysfs interface, add a simple selftest test case, and document the new file on the usage and the ABI documents, repsectively. This patch (of 9): DAMON provides nr_accesses_bp, which becomes same to nr_accesses * 10000 for every aggregation interval, but updated every sampling interval with a reasonable accuracy. Since DAMON-based operation schemes are applied in every aggregation interval using nr_accesses, using nr_accesses_bp instead will make no difference to users. Meanwhile, it allows DAMOS to apply the schemes in a time interval that less than the aggregation interval. It could be useful and more flexible for some cases. Do it. Link: https://lkml.kernel.org/r/20230916020945.47296-1-sj@kernel.org Link: https://lkml.kernel.org/r/20230916020945.47296-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: mark damon_moving_sum() as a static functionSeongJae Park2023-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | The function is used by only mm/damon/core.c. Mark it as a static function. Link: https://lkml.kernel.org/r/20230915025251.72816-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: skip updating nr_accesses_bp for each aggregation intervalSeongJae Park2023-10-041-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | damon_merge_regions_of(), which is called for each aggregation interval, updates nr_accesses_bp to nr_accesses * 10000. However, nr_accesses_bp is updated for each sampling interval via damon_moving_sum() using the aggregation interval as the moving time window. And by the definition of the algorithm, the value becomes same to discrete-window based sum for each time window-aligned time. Hence, nr_accesses_bp will be same to nr_accesses * 10000 for each aggregation interval without explicit update. Remove the unnecessary update of nr_accesses_bp in damon_merge_regions_of(). Link: https://lkml.kernel.org/r/20230915025251.72816-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: use pseudo-moving sum for nr_accesses_bpSeongJae Park2023-10-043-10/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | Let nr_accesses_bp be calculated as a pseudo-moving sum that updated for every sampling interval, using damon_moving_sum(). This is assumed to be useful for cases that the aggregation interval is set quite huge, but the monivoting results need to be collected earlier than next aggregation interval is passed. Link: https://lkml.kernel.org/r/20230915025251.72816-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: introduce nr_accesses_bpSeongJae Park2023-10-042-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add yet another representation of the access rate of each region, namely nr_accesses_bp. It is just same to the nr_accesses but represents the value in basis point (1 in 10,000), and updated at once in every aggregation interval. That is, moving_accesses_bp is just nr_accesses * 10000. This may seems useless at the moment. However, it will be useful for representing less than one nr_accesses value that will be needed to make moving sum-based nr_accesses. Link: https://lkml.kernel.org/r/20230915025251.72816-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core-test: add a unit test for damon_moving_sum()SeongJae Park2023-10-041-0/+16
| | | | | | | | | | | | | | | | | | | | Add a simple unit test for the pseudo moving-sum function (damon_moving_sum()). Link: https://lkml.kernel.org/r/20230915025251.72816-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: implement a pseudo-moving sum functionSeongJae Park2023-10-041-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For values that continuously change, moving average or sum are good ways to provide fast updates while handling temporal and errorneous variability of the value. For example, the access rate counter (nr_accesses) is calculated as a sum of the number of positive sampled access check results that collected during a discrete time window (aggregation interval), and hence it handles temporal and errorneous access check results, but provides the update only for every aggregation interval. Using a moving sum method for that could allow providing the value for every sampling interval. That could be useful for getting monitoring results snapshot or running DAMOS in fine-grained timing. However, supporting the moving sum for cases that number of samples in the time window is arbirary could impose high overhead, since the number of past values that it needs to keep could be too high. The nr_accesses would also be one of the cases. To mitigate the overhead, implement a pseudo-moving sum function that only provides an estimated pseudo-moving sum. It assumes there was no error in last discrete time window and subtract constant portion of last discrete time window sum. Note that the function is not strictly implementing the moving sum, but it keeps a property of moving sum, which makes the value same to the dsicrete-window based sum for each time window-aligned timing. Hence, people collecting the value in the old timings would show no difference. Link: https://lkml.kernel.org/r/20230915025251.72816-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/vaddr: call damon_update_region_access_rate() alwaysSeongJae Park2023-10-041-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When getting mm_struct of the monitoring target process fails, there wil be no need to increase the access rate counter (nr_accesses) of the regions for the process. Hence, damon_va_check_accesses() skips calling damon_update_region_access_rate() in the case. This breaks the assumption that damon_update_region_access_rate() is called for every region, for every sampling interval. Call the function for every region even in the case. This might increase the overhead in some cases, but such case would not be frequent, so no significant impact is really expected. Link: https://lkml.kernel.org/r/20230915025251.72816-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: define and use a dedicated function for region access rate updateSeongJae Park2023-10-043-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm/damon: provide pseudo-moving sum based access rate". DAMON checks the access to each region for every sampling interval, increase the access rate counter of the region, namely nr_accesses, if the access was made. For every aggregation interval, the counter is reset. The counter is exposed to users to be used as a metric showing the relative access rate (frequency) of each region. In other words, DAMON provides access rate of each region in every aggregation interval. The aggregation avoids temporal access pattern changes making things confusing. However, this also makes a few DAMON-related operations to unnecessarily need to be aligned to the aggregation interval. This can restrict the flexibility of DAMON applications, especially when the aggregation interval is huge. To provide the monitoring results in finer-grained timing while keeping handling of temporal access pattern change, this patchset implements a pseudo-moving sum based access rate metric. It is pseudo-moving sum because strict moving sum implementation would need to keep all values for last time window, and that could incur high overhead of there could be arbitrary number of values in a time window. Especially in case of the nr_accesses, since the sampling interval and aggregation interval can arbitrarily set and the past values should be maintained for every region, it could be risky. The pseudo-moving sum assumes there were no temporal access pattern change in last discrete time window to remove the needs for keeping the list of the last time window values. As a result, it beocmes not strict moving sum implementation, but provides a reasonable accuracy. Also, it keeps an important property of the moving sum. That is, the moving sum becomes same to discrete-window based sum at the time that aligns to the time window. This means using the pseudo moving sum based nr_accesses makes no change to users who shows the value for every aggregation interval. Patches Sequence ---------------- The sequence of the patches is as follows. The first four patches are for preparation of the change. The first two (patches 1 and 2) implements a helper function for nr_accesses update and eliminate corner case that skips use of the function, respectively. Following two (patches 3 and 4) respectively implement the pseudo-moving sum function and its simple unit test case. Two patches for making DAMON to use the pseudo-moving sum follow. The fifthe one (patch 5) introduces a new field for representing the pseudo-moving sum-based access rate of each region, and the sixth one makes the new representation to actually updated with the pseudo-moving sum function. Last two patches (patches 7 and 8) makes followup fixes for skipping unnecessary updates and marking the moving sum function as static, respectively. This patch (of 8): Each DAMON operarions set is updating nr_accesses field of each damon_region for each of their access check results, from the check_accesses() callback. Directly accessing the field could make things complex to manage and change in future. Define and use a dedicated function for the purpose. Link: https://lkml.kernel.org/r/20230915025251.72816-1-sj@kernel.org Link: https://lkml.kernel.org/r/20230915025251.72816-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: use number of passed access sampling as a timerSeongJae Park2023-10-041-49/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DAMON sleeps for sampling interval after each sampling, and check if the aggregation interval and the ops update interval have passed using ktime_get_coarse_ts64() and baseline timestamps for the intervals. That design is for making the operations occur at deterministic timing regardless of the time that spend for each work. However, it turned out it is not that useful, and incur not-that-intuitive results. After all, timer functions, and especially sleep functions that DAMON uses to wait for specific timing, are not necessarily strictly accurate. It is legal design, so no problem. However, depending on such inaccuracies, the nr_accesses can be larger than aggregation interval divided by sampling interval. For example, with the default setting (5 ms sampling interval and 100 ms aggregation interval) we frequently show regions having nr_accesses larger than 20. Also, if the execution of a DAMOS scheme takes a long time, next aggregation could happen before enough number of samples are collected. This is not what usual users would intuitively expect. Since access check sampling is the smallest unit work of DAMON, using the number of passed sampling intervals as the DAMON-internal timer can easily avoid these problems. That is, convert aggregation and ops update intervals to numbers of sampling intervals that need to be passed before those operations be executed, count the number of passed sampling intervals, and invoke the operations as soon as the specific amount of sampling intervals passed. Make the change. Note that this could make a behavioral change to settings that using intervals that not aligned by the sampling interval. For example, if the sampling interval is 5 ms and the aggregation interval is 12 ms, DAMON effectively uses 15 ms as its aggregation interval, because it checks whether the aggregation interval after sleeping the sampling interval. This change will make DAMON to effectively use 10 ms as aggregation interval, since it uses 'aggregation interval / sampling interval * sampling interval' as the effective aggregation interval, and we don't use floating point types. Usual users would have used aligned intervals, so this behavioral change is not expected to make any meaningful impact, so just make this change. Link: https://lkml.kernel.org/r/20230914021523.60649-1-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: add a tracepoint for damos apply target regionsSeongJae Park2023-10-041-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm/damon: add a tracepoint for damos apply target regions", v2. DAMON provides damon_aggregated tracepoint to let users record full monitoring results. Sometimes, users need to record monitoring results of specific pattern. DAMOS tried regions directory of DAMON sysfs interface allows it, but the interface is mainly designed for snapshots and therefore would be inefficient for such recording. Implement yet another tracepoint for efficient support of the usecase. This patch (of 2): DAMON provides damon_aggregated tracepoint, which exposes details of each region and its access monitoring results. It is useful for getting whole monitoring results, e.g., for recording purposes. For investigations of DAMOS, DAMON Sysfs interface provides DAMOS statistics and tried_regions directory. But, those provides only statistics and snapshots. If the scheme is frequently applied and if the user needs to know every detail of DAMOS behavior, the snapshot-based interface could be insufficient and expensive. As a last resort, userspace users need to record the all monitoring results via damon_aggregated tracepoint and simulate how DAMOS would worked. It is unnecessarily complicated. DAMON kernel API users, meanwhile, can do that easily via before_damos_apply() callback field of 'struct damon_callback', though. Add a tracepoint that will be called just after before_damos_apply() callback for more convenient investigations of DAMOS. The tracepoint exposes all details about each regions, similar to damon_aggregated tracepoint. Please note that DAMOS is currently not only for memory management but also for query-like efficient monitoring results retrievals (when 'stat' action is used). Until now, only statistics or snapshots were supported. Addition of this tracepoint allows efficient full recording of DAMOS-based filtered monitoring results. Link: https://lkml.kernel.org/r/20230913022050.2109-1-sj@kernel.org Link: https://lkml.kernel.org/r/20230913022050.2109-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> [tracing] Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: remove 'struct target *' parameter from damon_aggregated ↵SeongJae Park2023-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | tracepoint damon_aggregateed tracepoint is receiving 'struct target *', but doesn't use it. Remove it from the prototype. Link: https://lkml.kernel.org/r/20230907022929.91361-12-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | mm/damon/core: fix a comment about damon_set_attrs() call timingsSeongJae Park2023-10-041-1/+5
|/ | | | | | | | | | | | | The comment on damon_set_attrs() says it should not be called while the kdamond is running, but now some DAMON modules like sysfs interface and DAMON_RECLAIM call it from after_aggregation() and/or after_wmarks_check() callbacks for online tuning. Update the comment. Link: https://lkml.kernel.org/r/20230907022929.91361-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>