diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-05-19 09:21:03 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-05-19 09:21:03 -0700 |
commit | 61307b7be41a1f1039d1d1368810a1d92cb97b44 (patch) | |
tree | 639e233e177f8618cd5f86daeb7efc6b095890f0 /lib | |
parent | 0450d2083be6bdcd18c9535ac50c55266499b2df (diff) | |
parent | 76edc534cc289308130272a2ac28694fc9b72a03 (diff) | |
download | linux-61307b7be41a1f1039d1d1368810a1d92cb97b44.tar.gz linux-61307b7be41a1f1039d1d1368810a1d92cb97b44.tar.bz2 linux-61307b7be41a1f1039d1d1368810a1d92cb97b44.zip |
Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
Diffstat (limited to 'lib')
-rw-r--r-- | lib/Kconfig.debug | 31 | ||||
-rw-r--r-- | lib/Makefile | 3 | ||||
-rw-r--r-- | lib/alloc_tag.c | 246 | ||||
-rw-r--r-- | lib/codetag.c | 283 | ||||
-rw-r--r-- | lib/rhashtable.c | 22 | ||||
-rw-r--r-- | lib/test_hmm.c | 8 | ||||
-rw-r--r-- | lib/test_xarray.c | 93 | ||||
-rw-r--r-- | lib/xarray.c | 52 |
8 files changed, 707 insertions, 31 deletions
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 8bba448c819b..c1c1b19525a5 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -969,6 +969,37 @@ config DEBUG_STACKOVERFLOW If in doubt, say "N". +config CODE_TAGGING + bool + select KALLSYMS + +config MEM_ALLOC_PROFILING + bool "Enable memory allocation profiling" + default n + depends on PROC_FS + depends on !DEBUG_FORCE_WEAK_PER_CPU + select CODE_TAGGING + select PAGE_EXTENSION + select SLAB_OBJ_EXT + help + Track allocation source code and record total allocation size + initiated at that code location. The mechanism can be used to track + memory leaks with a low performance and memory impact. + +config MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT + bool "Enable memory allocation profiling by default" + default y + depends on MEM_ALLOC_PROFILING + +config MEM_ALLOC_PROFILING_DEBUG + bool "Memory allocation profiler debugging" + default n + depends on MEM_ALLOC_PROFILING + select MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT + help + Adds warnings with helpful error messages for memory allocation + profiling. + source "lib/Kconfig.kasan" source "lib/Kconfig.kfence" source "lib/Kconfig.kmsan" diff --git a/lib/Makefile b/lib/Makefile index 2410ea456f8a..7a1fdd1cce7a 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -233,6 +233,9 @@ obj-$(CONFIG_OF_RECONFIG_NOTIFIER_ERROR_INJECT) += \ of-reconfig-notifier-error-inject.o obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o +obj-$(CONFIG_CODE_TAGGING) += codetag.o +obj-$(CONFIG_MEM_ALLOC_PROFILING) += alloc_tag.o + lib-$(CONFIG_GENERIC_BUG) += bug.o obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c new file mode 100644 index 000000000000..531dbe2f5456 --- /dev/null +++ b/lib/alloc_tag.c @@ -0,0 +1,246 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include <linux/alloc_tag.h> +#include <linux/fs.h> +#include <linux/gfp.h> +#include <linux/module.h> +#include <linux/page_ext.h> +#include <linux/proc_fs.h> +#include <linux/seq_buf.h> +#include <linux/seq_file.h> + +static struct codetag_type *alloc_tag_cttype; + +DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag); +EXPORT_SYMBOL(_shared_alloc_tag); + +DEFINE_STATIC_KEY_MAYBE(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT, + mem_alloc_profiling_key); + +static void *allocinfo_start(struct seq_file *m, loff_t *pos) +{ + struct codetag_iterator *iter; + struct codetag *ct; + loff_t node = *pos; + + iter = kzalloc(sizeof(*iter), GFP_KERNEL); + m->private = iter; + if (!iter) + return NULL; + + codetag_lock_module_list(alloc_tag_cttype, true); + *iter = codetag_get_ct_iter(alloc_tag_cttype); + while ((ct = codetag_next_ct(iter)) != NULL && node) + node--; + + return ct ? iter : NULL; +} + +static void *allocinfo_next(struct seq_file *m, void *arg, loff_t *pos) +{ + struct codetag_iterator *iter = (struct codetag_iterator *)arg; + struct codetag *ct = codetag_next_ct(iter); + + (*pos)++; + if (!ct) + return NULL; + + return iter; +} + +static void allocinfo_stop(struct seq_file *m, void *arg) +{ + struct codetag_iterator *iter = (struct codetag_iterator *)m->private; + + if (iter) { + codetag_lock_module_list(alloc_tag_cttype, false); + kfree(iter); + } +} + +static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct) +{ + struct alloc_tag *tag = ct_to_alloc_tag(ct); + struct alloc_tag_counters counter = alloc_tag_read(tag); + s64 bytes = counter.bytes; + + seq_buf_printf(out, "%12lli %8llu ", bytes, counter.calls); + codetag_to_text(out, ct); + seq_buf_putc(out, ' '); + seq_buf_putc(out, '\n'); +} + +static int allocinfo_show(struct seq_file *m, void *arg) +{ + struct codetag_iterator *iter = (struct codetag_iterator *)arg; + char *bufp; + size_t n = seq_get_buf(m, &bufp); + struct seq_buf buf; + + seq_buf_init(&buf, bufp, n); + alloc_tag_to_text(&buf, iter->ct); + seq_commit(m, seq_buf_used(&buf)); + return 0; +} + +static const struct seq_operations allocinfo_seq_op = { + .start = allocinfo_start, + .next = allocinfo_next, + .stop = allocinfo_stop, + .show = allocinfo_show, +}; + +size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sleep) +{ + struct codetag_iterator iter; + struct codetag *ct; + struct codetag_bytes n; + unsigned int i, nr = 0; + + if (can_sleep) + codetag_lock_module_list(alloc_tag_cttype, true); + else if (!codetag_trylock_module_list(alloc_tag_cttype)) + return 0; + + iter = codetag_get_ct_iter(alloc_tag_cttype); + while ((ct = codetag_next_ct(&iter))) { + struct alloc_tag_counters counter = alloc_tag_read(ct_to_alloc_tag(ct)); + + n.ct = ct; + n.bytes = counter.bytes; + + for (i = 0; i < nr; i++) + if (n.bytes > tags[i].bytes) + break; + + if (i < count) { + nr -= nr == count; + memmove(&tags[i + 1], + &tags[i], + sizeof(tags[0]) * (nr - i)); + nr++; + tags[i] = n; + } + } + + codetag_lock_module_list(alloc_tag_cttype, false); + + return nr; +} + +static void __init procfs_init(void) +{ + proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op); +} + +static bool alloc_tag_module_unload(struct codetag_type *cttype, + struct codetag_module *cmod) +{ + struct codetag_iterator iter = codetag_get_ct_iter(cttype); + struct alloc_tag_counters counter; + bool module_unused = true; + struct alloc_tag *tag; + struct codetag *ct; + + for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) { + if (iter.cmod != cmod) + continue; + + tag = ct_to_alloc_tag(ct); + counter = alloc_tag_read(tag); + + if (WARN(counter.bytes, + "%s:%u module %s func:%s has %llu allocated at module unload", + ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes)) + module_unused = false; + } + + return module_unused; +} + +#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT +static bool mem_profiling_support __meminitdata = true; +#else +static bool mem_profiling_support __meminitdata; +#endif + +static int __init setup_early_mem_profiling(char *str) +{ + bool enable; + + if (!str || !str[0]) + return -EINVAL; + + if (!strncmp(str, "never", 5)) { + enable = false; + mem_profiling_support = false; + } else { + int res; + + res = kstrtobool(str, &enable); + if (res) + return res; + + mem_profiling_support = true; + } + + if (enable != static_key_enabled(&mem_alloc_profiling_key)) { + if (enable) + static_branch_enable(&mem_alloc_profiling_key); + else + static_branch_disable(&mem_alloc_profiling_key); + } + + return 0; +} +early_param("sysctl.vm.mem_profiling", setup_early_mem_profiling); + +static __init bool need_page_alloc_tagging(void) +{ + return mem_profiling_support; +} + +static __init void init_page_alloc_tagging(void) +{ +} + +struct page_ext_operations page_alloc_tagging_ops = { + .size = sizeof(union codetag_ref), + .need = need_page_alloc_tagging, + .init = init_page_alloc_tagging, +}; +EXPORT_SYMBOL(page_alloc_tagging_ops); + +static struct ctl_table memory_allocation_profiling_sysctls[] = { + { + .procname = "mem_profiling", + .data = &mem_alloc_profiling_key, +#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG + .mode = 0444, +#else + .mode = 0644, +#endif + .proc_handler = proc_do_static_key, + }, + { } +}; + +static int __init alloc_tag_init(void) +{ + const struct codetag_type_desc desc = { + .section = "alloc_tags", + .tag_size = sizeof(struct alloc_tag), + .module_unload = alloc_tag_module_unload, + }; + + alloc_tag_cttype = codetag_register_type(&desc); + if (IS_ERR(alloc_tag_cttype)) + return PTR_ERR(alloc_tag_cttype); + + if (!mem_profiling_support) + memory_allocation_profiling_sysctls[0].mode = 0444; + register_sysctl_init("vm", memory_allocation_profiling_sysctls); + procfs_init(); + + return 0; +} +module_init(alloc_tag_init); diff --git a/lib/codetag.c b/lib/codetag.c new file mode 100644 index 000000000000..5ace625f2328 --- /dev/null +++ b/lib/codetag.c @@ -0,0 +1,283 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include <linux/codetag.h> +#include <linux/idr.h> +#include <linux/kallsyms.h> +#include <linux/module.h> +#include <linux/seq_buf.h> +#include <linux/slab.h> +#include <linux/vmalloc.h> + +struct codetag_type { + struct list_head link; + unsigned int count; + struct idr mod_idr; + struct rw_semaphore mod_lock; /* protects mod_idr */ + struct codetag_type_desc desc; +}; + +struct codetag_range { + struct codetag *start; + struct codetag *stop; +}; + +struct codetag_module { + struct module *mod; + struct codetag_range range; +}; + +static DEFINE_MUTEX(codetag_lock); +static LIST_HEAD(codetag_types); + +void codetag_lock_module_list(struct codetag_type *cttype, bool lock) +{ + if (lock) + down_read(&cttype->mod_lock); + else + up_read(&cttype->mod_lock); +} + +bool codetag_trylock_module_list(struct codetag_type *cttype) +{ + return down_read_trylock(&cttype->mod_lock) != 0; +} + +struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype) +{ + struct codetag_iterator iter = { + .cttype = cttype, + .cmod = NULL, + .mod_id = 0, + .ct = NULL, + }; + + return iter; +} + +static inline struct codetag *get_first_module_ct(struct codetag_module *cmod) +{ + return cmod->range.start < cmod->range.stop ? cmod->range.start : NULL; +} + +static inline +struct codetag *get_next_module_ct(struct codetag_iterator *iter) +{ + struct codetag *res = (struct codetag *) + ((char *)iter->ct + iter->cttype->desc.tag_size); + + return res < iter->cmod->range.stop ? res : NULL; +} + +struct codetag *codetag_next_ct(struct codetag_iterator *iter) +{ + struct codetag_type *cttype = iter->cttype; + struct codetag_module *cmod; + struct codetag *ct; + + lockdep_assert_held(&cttype->mod_lock); + + if (unlikely(idr_is_empty(&cttype->mod_idr))) + return NULL; + + ct = NULL; + while (true) { + cmod = idr_find(&cttype->mod_idr, iter->mod_id); + + /* If module was removed move to the next one */ + if (!cmod) + cmod = idr_get_next_ul(&cttype->mod_idr, + &iter->mod_id); + + /* Exit if no more modules */ + if (!cmod) + break; + + if (cmod != iter->cmod) { + iter->cmod = cmod; + ct = get_first_module_ct(cmod); + } else + ct = get_next_module_ct(iter); + + if (ct) + break; + + iter->mod_id++; + } + + iter->ct = ct; + return ct; +} + +void codetag_to_text(struct seq_buf *out, struct codetag *ct) +{ + if (ct->modname) + seq_buf_printf(out, "%s:%u [%s] func:%s", + ct->filename, ct->lineno, + ct->modname, ct->function); + else + seq_buf_printf(out, "%s:%u func:%s", + ct->filename, ct->lineno, ct->function); +} + +static inline size_t range_size(const struct codetag_type *cttype, + const struct codetag_range *range) +{ + return ((char *)range->stop - (char *)range->start) / + cttype->desc.tag_size; +} + +#ifdef CONFIG_MODULES +static void *get_symbol(struct module *mod, const char *prefix, const char *name) +{ + DECLARE_SEQ_BUF(sb, KSYM_NAME_LEN); + const char *buf; + void *ret; + + seq_buf_printf(&sb, "%s%s", prefix, name); + if (seq_buf_has_overflowed(&sb)) + return NULL; + + buf = seq_buf_str(&sb); + preempt_disable(); + ret = mod ? + (void *)find_kallsyms_symbol_value(mod, buf) : + (void *)kallsyms_lookup_name(buf); + preempt_enable(); + + return ret; +} + +static struct codetag_range get_section_range(struct module *mod, + const char *section) +{ + return (struct codetag_range) { + get_symbol(mod, "__start_", section), + get_symbol(mod, "__stop_", section), + }; +} + +static int codetag_module_init(struct codetag_type *cttype, struct module *mod) +{ + struct codetag_range range; + struct codetag_module *cmod; + int err; + + range = get_section_range(mod, cttype->desc.section); + if (!range.start || !range.stop) { + pr_warn("Failed to load code tags of type %s from the module %s\n", + cttype->desc.section, + mod ? mod->name : "(built-in)"); + return -EINVAL; + } + + /* Ignore empty ranges */ + if (range.start == range.stop) + return 0; + + BUG_ON(range.start > range.stop); + + cmod = kmalloc(sizeof(*cmod), GFP_KERNEL); + if (unlikely(!cmod)) + return -ENOMEM; + + cmod->mod = mod; + cmod->range = range; + + down_write(&cttype->mod_lock); + err = idr_alloc(&cttype->mod_idr, cmod, 0, 0, GFP_KERNEL); + if (err >= 0) { + cttype->count += range_size(cttype, &range); + if (cttype->desc.module_load) + cttype->desc.module_load(cttype, cmod); + } + up_write(&cttype->mod_lock); + + if (err < 0) { + kfree(cmod); + return err; + } + + return 0; +} + +void codetag_load_module(struct module *mod) +{ + struct codetag_type *cttype; + + if (!mod) + return; + + mutex_lock(&codetag_lock); + list_for_each_entry(cttype, &codetag_types, link) + codetag_module_init(cttype, mod); + mutex_unlock(&codetag_lock); +} + +bool codetag_unload_module(struct module *mod) +{ + struct codetag_type *cttype; + bool unload_ok = true; + + if (!mod) + return true; + + mutex_lock(&codetag_lock); + list_for_each_entry(cttype, &codetag_types, link) { + struct codetag_module *found = NULL; + struct codetag_module *cmod; + unsigned long mod_id, tmp; + + down_write(&cttype->mod_lock); + idr_for_each_entry_ul(&cttype->mod_idr, cmod, tmp, mod_id) { + if (cmod->mod && cmod->mod == mod) { + found = cmod; + break; + } + } + if (found) { + if (cttype->desc.module_unload) + if (!cttype->desc.module_unload(cttype, cmod)) + unload_ok = false; + + cttype->count -= range_size(cttype, &cmod->range); + idr_remove(&cttype->mod_idr, mod_id); + kfree(cmod); + } + up_write(&cttype->mod_lock); + } + mutex_unlock(&codetag_lock); + + return unload_ok; +} + +#else /* CONFIG_MODULES */ +static int codetag_module_init(struct codetag_type *cttype, struct module *mod) { return 0; } +#endif /* CONFIG_MODULES */ + +struct codetag_type * +codetag_register_type(const struct codetag_type_desc *desc) +{ + struct codetag_type *cttype; + int err; + + BUG_ON(desc->tag_size <= 0); + + cttype = kzalloc(sizeof(*cttype), GFP_KERNEL); + if (unlikely(!cttype)) + return ERR_PTR(-ENOMEM); + + cttype->desc = *desc; + idr_init(&cttype->mod_idr); + init_rwsem(&cttype->mod_lock); + + err = codetag_module_init(cttype, NULL); + if (unlikely(err)) { + kfree(cttype); + return ERR_PTR(err); + } + + mutex_lock(&codetag_lock); + list_add_tail(&cttype->link, &codetag_types); + mutex_unlock(&codetag_lock); + + return cttype; +} diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 6ae2ba8e06a2..dbbed19f8fff 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -130,7 +130,8 @@ static union nested_table *nested_table_alloc(struct rhashtable *ht, if (ntbl) return ntbl; - ntbl = kzalloc(PAGE_SIZE, GFP_ATOMIC); + ntbl = alloc_hooks_tag(ht->alloc_tag, + kmalloc_noprof(PAGE_SIZE, GFP_ATOMIC|__GFP_ZERO)); if (ntbl && leaf) { for (i = 0; i < PAGE_SIZE / sizeof(ntbl[0]); i++) @@ -157,7 +158,8 @@ static struct bucket_table *nested_bucket_table_alloc(struct rhashtable *ht, size = sizeof(*tbl) + sizeof(tbl->buckets[0]); - tbl = kzalloc(size, gfp); + tbl = alloc_hooks_tag(ht->alloc_tag, + kmalloc_noprof(size, gfp|__GFP_ZERO)); if (!tbl) return NULL; @@ -181,7 +183,9 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht, int i; static struct lock_class_key __key; - tbl = kvzalloc(struct_size(tbl, buckets, nbuckets), gfp); + tbl = alloc_hooks_tag(ht->alloc_tag, + kvmalloc_node_noprof(struct_size(tbl, buckets, nbuckets), + gfp|__GFP_ZERO, NUMA_NO_NODE)); size = nbuckets; @@ -1016,7 +1020,7 @@ static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed) * .obj_hashfn = my_hash_fn, * }; */ -int rhashtable_init(struct rhashtable *ht, +int rhashtable_init_noprof(struct rhashtable *ht, const struct rhashtable_params *params) { struct bucket_table *tbl; @@ -1031,6 +1035,8 @@ int rhashtable_init(struct rhashtable *ht, spin_lock_init(&ht->lock); memcpy(&ht->p, params, sizeof(*params)); + alloc_tag_record(ht->alloc_tag); + if (params->min_size) ht->p.min_size = roundup_pow_of_two(params->min_size); @@ -1076,7 +1082,7 @@ int rhashtable_init(struct rhashtable *ht, return 0; } -EXPORT_SYMBOL_GPL(rhashtable_init); +EXPORT_SYMBOL_GPL(rhashtable_init_noprof); /** * rhltable_init - initialize a new hash list table @@ -1087,15 +1093,15 @@ EXPORT_SYMBOL_GPL(rhashtable_init); * * See documentation for rhashtable_init. */ -int rhltable_init(struct rhltable *hlt, const struct rhashtable_params *params) +int rhltable_init_noprof(struct rhltable *hlt, const struct rhashtable_params *params) { int err; - err = rhashtable_init(&hlt->ht, params); + err = rhashtable_init_noprof(&hlt->ht, params); hlt->ht.rhlist = true; return err; } -EXPORT_SYMBOL_GPL(rhltable_init); +EXPORT_SYMBOL_GPL(rhltable_init_noprof); static void rhashtable_free_one(struct rhashtable *ht, struct rhash_head *obj, void (*free_fn)(void *ptr, void *arg), diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 717dcb830127..b823ba7cb6a1 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -1226,8 +1226,8 @@ static void dmirror_device_evict_chunk(struct dmirror_chunk *chunk) unsigned long *src_pfns; unsigned long *dst_pfns; - src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL); - dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL); + src_pfns = kvcalloc(npages, sizeof(*src_pfns), GFP_KERNEL | __GFP_NOFAIL); + dst_pfns = kvcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL | __GFP_NOFAIL); migrate_device_range(src_pfns, start_pfn, npages); for (i = 0; i < npages; i++) { @@ -1250,8 +1250,8 @@ static void dmirror_device_evict_chunk(struct dmirror_chunk *chunk) } migrate_device_pages(src_pfns, dst_pfns, npages); migrate_device_finalize(src_pfns, dst_pfns, npages); - kfree(src_pfns); - kfree(dst_pfns); + kvfree(src_pfns); + kvfree(dst_pfns); } /* Removes free pages from the free list so they can't be re-allocated */ diff --git a/lib/test_xarray.c b/lib/test_xarray.c index 928fc20337e6..ab9cc42a0d74 100644 --- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -2001,6 +2001,97 @@ static noinline void check_get_order(struct xarray *xa) } } +static noinline void check_xas_get_order(struct xarray *xa) +{ + XA_STATE(xas, xa, 0); + + unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 20 : 1; + unsigned int order; + unsigned long i, j; + + for (order = 0; order < max_order; order++) { + for (i = 0; i < 10; i++) { + xas_set_order(&xas, i << order, order); + do { + xas_lock(&xas); + xas_store(&xas, xa_mk_value(i)); + xas_unlock(&xas); + } while (xas_nomem(&xas, GFP_KERNEL)); + + for (j = i << order; j < (i + 1) << order; j++) { + xas_set_order(&xas, j, 0); + rcu_read_lock(); + xas_load(&xas); + XA_BUG_ON(xa, xas_get_order(&xas) != order); + rcu_read_unlock(); + } + + xas_lock(&xas); + xas_set_order(&xas, i << order, order); + xas_store(&xas, NULL); + xas_unlock(&xas); + } + } +} + +static noinline void check_xas_conflict_get_order(struct xarray *xa) +{ + XA_STATE(xas, xa, 0); + + void *entry; + int only_once; + unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 20 : 1; + unsigned int order; + unsigned long i, j, k; + + for (order = 0; order < max_order; order++) { + for (i = 0; i < 10; i++) { + xas_set_order(&xas, i << order, order); + do { + xas_lock(&xas); + xas_store(&xas, xa_mk_value(i)); + xas_unlock(&xas); + } while (xas_nomem(&xas, GFP_KERNEL)); + + /* + * Ensure xas_get_order works with xas_for_each_conflict. + */ + j = i << order; + for (k = 0; k < order; k++) { + only_once = 0; + xas_set_order(&xas, j + (1 << k), k); + xas_lock(&xas); + xas_for_each_conflict(&xas, entry) { + XA_BUG_ON(xa, entry != xa_mk_value(i)); + XA_BUG_ON(xa, xas_get_order(&xas) != order); + only_once++; + } + XA_BUG_ON(xa, only_once != 1); + xas_unlock(&xas); + } + + if (order < max_order - 1) { + only_once = 0; + xas_set_order(&xas, (i & ~1UL) << order, order + 1); + xas_lock(&xas); + xas_for_each_conflict(&xas, entry) { + XA_BUG_ON(xa, entry != xa_mk_value(i)); + XA_BUG_ON(xa, xas_get_order(&xas) != order); + only_once++; + } + XA_BUG_ON(xa, only_once != 1); + xas_unlock(&xas); + } + + xas_set_order(&xas, i << order, order); + xas_lock(&xas); + xas_store(&xas, NULL); + xas_unlock(&xas); + } + } +} + + static noinline void check_destroy(struct xarray *xa) { unsigned long index; @@ -2052,6 +2143,8 @@ static int xarray_checks(void) check_multi_store(&array); check_multi_store_advanced(&array); check_get_order(&array); + check_xas_get_order(&array); + check_xas_conflict_get_order(&array); check_xa_alloc(); check_find(&array); check_find_entry(&array); diff --git a/lib/xarray.c b/lib/xarray.c index 5e7d6334d70d..32d4bac8c94c 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -200,7 +200,8 @@ static void *xas_start(struct xa_state *xas) return entry; } -static void *xas_descend(struct xa_state *xas, struct xa_node *node) +static __always_inline void *xas_descend(struct xa_state *xas, + struct xa_node *node) { unsigned int offset = get_offset(xas->xa_index, node); void *entry = xa_entry(xas->xa, node, offset); @@ -1765,39 +1766,52 @@ unlock: EXPORT_SYMBOL(xa_store_range); /** - * xa_get_order() - Get the order of an entry. - * @xa: XArray. - * @index: Index of the entry. + * xas_get_order() - Get the order of an entry. + * @xas: XArray operation state. + * + * Called after xas_load, the xas should not be in an error state. * * Return: A number between 0 and 63 indicating the order of the entry. */ -int xa_get_order(struct xarray *xa, unsigned long index) +int xas_get_order(struct xa_state *xas) { - XA_STATE(xas, xa, index); - void *entry; int order = 0; - rcu_read_lock(); - entry = xas_load(&xas); - - if (!entry) - goto unlock; - - if (!xas.xa_node) - goto unlock; + if (!xas->xa_node) + return 0; for (;;) { - unsigned int slot = xas.xa_offset + (1 << order); + unsigned int slot = xas->xa_offset + (1 << order); if (slot >= XA_CHUNK_SIZE) break; - if (!xa_is_sibling(xas.xa_node->slots[slot])) + if (!xa_is_sibling(xa_entry(xas->xa, xas->xa_node, slot))) break; order++; } - order += xas.xa_node->shift; -unlock: + order += xas->xa_node->shift; + return order; +} +EXPORT_SYMBOL_GPL(xas_get_order); + +/** + * xa_get_order() - Get the order of an entry. + * @xa: XArray. + * @index: Index of the entry. + * + * Return: A number between 0 and 63 indicating the order of the entry. + */ +int xa_get_order(struct xarray *xa, unsigned long index) +{ + XA_STATE(xas, xa, index); + int order = 0; + void *entry; + + rcu_read_lock(); + entry = xas_load(&xas); + if (entry) + order = xas_get_order(&xas); rcu_read_unlock(); return order; |