summaryrefslogtreecommitdiffstats
path: root/mm
Commit message (Collapse)AuthorAgeFilesLines
...
* | | | mm/gup_benchmark.c: prevent integer overflow in ioctlDan Carpenter2018-10-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The concern here is that "gup->size" is a u64 and "nr_pages" is unsigned long. On 32 bit systems we could trick the kernel into allocating fewer pages than expected. Link: http://lkml.kernel.org/r/20181025061546.hnhkv33diogf2uis@kili.mountain Fixes: 64c349f4ae78 ("mm: add infrastructure for get_user_pages_fast() benchmarking") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Keith Busch <keith.busch@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm/hmm: invalidate device page table at start of invalidationJérôme Glisse2018-10-311-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Invalidate device page table at start of invalidation and invalidate in progress CPU page table snapshooting at both start and end of any invalidation. This is helpful when device need to dirty page because the device page table report the page as dirty. Dirtying page must happen in the start mmu notifier callback and not in the end one. Link: http://lkml.kernel.org/r/20181019160442.18723-7-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm/hmm: use a structure for update callback parametersJérôme Glisse2018-10-311-11/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a structure to gather all the parameters for the update callback. This make it easier when adding new parameters by avoiding having to update all callback function signature. The hmm_update structure is always associated with a mmu_notifier callbacks so we are not planing on grouping multiple updates together. Nor do we care about page size for the range as range will over fully cover the page being invalidated (this is a mmu_notifier property). Link: http://lkml.kernel.org/r/20181019160442.18723-6-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm/hmm: properly handle migration pmdJérôme Glisse2018-10-311-6/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch migration pmd entry (!pmd_present()) would have been treated as a bad entry (pmd_bad() returns true on migration pmd entry). The outcome was that device driver would believe that the range covered by the pmd was bad and would either SIGBUS or simply kill all the device's threads (each device driver decide how to react when the device tries to access poisonnous or invalid range of memory). This patch explicitly handle the case of migration pmd entry which are non present pmd entry and either wait for the migration to finish or report empty range (when device is just trying to pre- fill a range of virtual address and thus do not want to wait or trigger page fault). Link: http://lkml.kernel.org/r/20181019160442.18723-5-jglisse@redhat.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Michal Hocko <mhocko@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm/hmm: fix race between hmm_mirror_unregister() and mmu_notifier callbackRalph Campbell2018-10-311-15/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In hmm_mirror_unregister(), mm->hmm is set to NULL and then mmu_notifier_unregister_no_release() is called. That creates a small window where mmu_notifier can call mmu_notifier_ops with mm->hmm equal to NULL. Fix this by first unregistering mmu notifier callbacks and then setting mm->hmm to NULL. Similarly in hmm_register(), set mm->hmm before registering mmu_notifier callbacks so callback functions always see mm->hmm set. Link: http://lkml.kernel.org/r/20181019160442.18723-4-jglisse@redhat.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm/rmap: map_pte() was not handling private ZONE_DEVICE page properlyRalph Campbell2018-10-311-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Private ZONE_DEVICE pages use a special pte entry and thus are not present. Properly handle this case in map_pte(), it is already handled in check_pte(), the map_pte() part was lost in some rebase most probably. Without this patch the slow migration path can not migrate back to any private ZONE_DEVICE memory to regular memory. This was found after stress testing migration back to system memory. This ultimatly can lead to the CPU constantly page fault looping on the special swap entry. Link: http://lkml.kernel.org/r/20181019160442.18723-3-jglisse@redhat.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm/hmm: fix utf8 ...Jérôme Glisse2018-10-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "HMM updates, improvements and fixes", v2 Few fixes that only affect HMM users. Improve the synchronization call back so that we match was other mmu_notifier listener do and add proper support to the new blockable flags in the process. For curious folks here are branches to leverage HMM in various existing device drivers: https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-nouveau-v01 https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-radeon-v00 https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-intel-v00 More to come (amd gpu, Mellanox, ...) I expect more of the preparatory work for nouveau will be merge in 4.20 (like we have been doing since 4.16) and i will wait until this patchset is upstream before pushing the patches that actualy make use of HMM (to avoid complex tree inter-dependency). This patch (of 6): Somehow utf=8 must have been broken. Link: http://lkml.kernel.org/r/20181019160442.18723-2-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds2018-10-2817-940/+647
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull XArray conversion from Matthew Wilcox: "The XArray provides an improved interface to the radix tree data structure, providing locking as part of the API, specifying GFP flags at allocation time, eliminating preloading, less re-walking the tree, more efficient iterations and not exposing RCU-protected pointers to its users. This patch set 1. Introduces the XArray implementation 2. Converts the pagecache to use it 3. Converts memremap to use it The page cache is the most complex and important user of the radix tree, so converting it was most important. Converting the memremap code removes the only other user of the multiorder code, which allows us to remove the radix tree code that supported it. I have 40+ followup patches to convert many other users of the radix tree over to the XArray, but I'd like to get this part in first. The other conversions haven't been in linux-next and aren't suitable for applying yet, but you can see them in the xarray-conv branch if you're interested" * 'xarray' of git://git.infradead.org/users/willy/linux-dax: (90 commits) radix tree: Remove multiorder support radix tree test: Convert multiorder tests to XArray radix tree tests: Convert item_delete_rcu to XArray radix tree tests: Convert item_kill_tree to XArray radix tree tests: Move item_insert_order radix tree test suite: Remove multiorder benchmarking radix tree test suite: Remove __item_insert memremap: Convert to XArray xarray: Add range store functionality xarray: Move multiorder_check to in-kernel tests xarray: Move multiorder_shrink to kernel tests xarray: Move multiorder account test in-kernel radix tree test suite: Convert iteration test to XArray radix tree test suite: Convert tag_tagged_items to XArray radix tree: Remove radix_tree_clear_tags radix tree: Remove radix_tree_maybe_preload_order radix tree: Remove split/join code radix tree: Remove radix_tree_update_node_t page cache: Finish XArray conversion dax: Convert page fault handlers to XArray ...
| * | | | radix tree: Remove multiorder supportMatthew Wilcox2018-10-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All users have now been converted to the XArray. Removing the support reduces code size and ensures new users will use the XArray instead. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Finish XArray conversionMatthew Wilcox2018-10-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With no more radix tree API users left, we can drop the GFP flags and use xa_init() instead of INIT_RADIX_TREE(). Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | shmem: Comment fixupsMatthew Wilcox2018-10-211-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the last mentions of radix tree from various comments. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | memfd: Convert memfd_tag_pins to XArrayMatthew Wilcox2018-10-211-26/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch to a batch-processing model like memfd_wait_for_pins() and use the xa_state previously set up by memfd_wait_for_pins(). Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
| * | | | memfd: Convert memfd_wait_for_pins to XArrayMatthew Wilcox2018-10-211-36/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify the locking by taking the spinlock while we walk the tree on the assumption that many acquires and releases of the lock will be worse than holding the lock while we process an entire batch of pages. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
| * | | | shmem: Convert shmem_partial_swap_usage to XArrayMatthew Wilcox2018-10-211-14/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simpler code because the xarray takes care of things like the limit and dereferencing the slot. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | shmem: Convert shmem_free_swap to XArrayMatthew Wilcox2018-10-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we are conditionally storing NULL in the XArray, we do not need to allocate memory and the GFP flags will be unused. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | shmem: Convert shmem_alloc_hugepage to XArrayMatthew Wilcox2018-10-211-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xa_find() is a slightly easier API to use than radix_tree_gang_lookup_slot() because it contains its own RCU locking. This commit removes the last user of radix_tree_gang_lookup_slot() so remove the function too. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | shmem: Convert shmem_add_to_page_cache to XArrayMatthew Wilcox2018-10-211-47/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can use xas_find_conflict() instead of radix_tree_gang_lookup_slot() to find any conflicting entry and combine the three paths through this function into one. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | shmem: Convert find_swap_entry to XArrayMatthew Wilcox2018-10-211-17/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a 1:1 conversion. The major part of this patch is converting the test framework from userspace to kernel space and mirroring the algorithm now used in find_swap_entry(). Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | shmem: Convert shmem_confirm_swap to XArrayMatthew Wilcox2018-10-211-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xa_load has its own RCU locking, so we can eliminate it here. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | shmem: Convert shmem_radix_tree_replace to XArrayMatthew Wilcox2018-10-211-14/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename shmem_radix_tree_replace() to shmem_replace_entry() and convert it to use the XArray API. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | pagevec: Use xa_mark_tMatthew Wilcox2018-10-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removes sparse warnings. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | mm: Convert is_page_cache_freeable to XArrayMatthew Wilcox2018-10-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is just a variable rename and comment change. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | mm: Convert khugepaged_scan_shmem to XArrayMatthew Wilcox2018-10-211-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Slightly shorter and easier to read code. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | mm: Convert collapse_shmem to XArrayMatthew Wilcox2018-10-211-93/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found another victim of the radix tree being hard to use. Because there was no call to radix_tree_preload(), khugepaged was allocating radix_tree_nodes using GFP_ATOMIC. I also converted a local_irq_save()/restore() pair to disable()/enable(). Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | mm: Convert huge_memory to XArrayMatthew Wilcox2018-10-211-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quite a straightforward conversion. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | mm: Convert page migration to XArrayMatthew Wilcox2018-10-211-30/+18
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | mm: Convert __do_page_cache_readahead to XArrayMatthew Wilcox2018-10-211-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This one is trivial. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | mm: Convert delete_from_swap_cache to XArrayMatthew Wilcox2018-10-212-15/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both callers of __delete_from_swap_cache have the swp_entry_t already, so pass that in to make constructing the XA_STATE easier. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | mm: Convert add_to_swap_cache to XArrayMatthew Wilcox2018-10-211-64/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Combine __add_to_swap_cache and add_to_swap_cache into one function since there is no more need to preload. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | mm: Convert truncate to XArrayMatthew Wilcox2018-10-211-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is essentially xa_cmpxchg() with the locking handled above us, and it doesn't have to handle replacing a NULL entry. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | mm: Convert workingset to XArrayMatthew Wilcox2018-10-211-30/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We construct an XA_STATE and use it to delete the node with xas_store() rather than adding a special function for this unique use case. Includes a test that simulates this usage for the test suite. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | mm: Convert page-writeback to XArrayMatthew Wilcox2018-10-211-46/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Includes moving mapping_tagged() to fs.h as a static inline, and changing it to return bool. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Convert filemap_range_has_page to XArrayMatthew Wilcox2018-10-211-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of calling find_get_pages_range() and putting any reference, use xas_find() to iterate over any entries in the range, skipping the shadow/swap entries. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Remove stray radix commentMatthew Wilcox2018-10-211-1/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Convert delete_batch to XArrayMatthew Wilcox2018-10-211-15/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the function from page_cache_tree_delete_batch to just page_cache_delete_batch. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Convert filemap_map_pages to XArrayMatthew Wilcox2018-10-211-29/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Slight change of strategy here; if we have trouble getting hold of a page for whatever reason (eg a compound page is split underneath us), don't spin to stabilise the page, just continue the iteration, like we would if we failed to trylock the page. Since this is a speculative optimisation, it feels like we should allow the process to take an extra fault if it turns out to need this page instead of spending time to pin down a page it may not need. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Convert find_get_entries_tag to XArrayMatthew Wilcox2018-10-211-30/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Slightly shorter and simpler code. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache; Convert find_get_pages_range_tag to XArrayMatthew Wilcox2018-10-211-42/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'end' parameter of the xas_for_each iterator avoids a useless iteration at the end of the range. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Convert find_get_pages_contig to XArrayMatthew Wilcox2018-10-211-31/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no direct replacement for radix_tree_for_each_contig() in the XArray API as it's an unusual thing to do. Instead, open-code a loop using xas_next(). This removes the only user of radix_tree_for_each_contig() so delete the iterator from the API and the test suite code for it. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Convert find_get_pages_range to XArrayMatthew Wilcox2018-10-211-33/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'end' parameter of the xas_for_each iterator avoids a useless iteration at the end of the range. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Convert find_get_entries to XArrayMatthew Wilcox2018-10-211-28/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Slightly shorter and simpler code. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Convert find_get_entry to XArrayMatthew Wilcox2018-10-211-35/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Slightly shorter and simpler code. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Convert page deletion to XArrayMatthew Wilcox2018-10-211-18/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code is slightly shorter and simpler. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Add and replace pages using the XArrayMatthew Wilcox2018-10-211-82/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the XArray APIs to add and replace pages in the page cache. This removes two uses of the radix tree preload API and is significantly shorter code. It also removes the last user of __radix_tree_create() outside radix-tree.c itself, so make it static. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | page cache: Convert hole search to XArrayMatthew Wilcox2018-10-212-62/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The page cache offers the ability to search for a miss in the previous or next N locations. Rather than teach the XArray about the page cache's definition of a miss, use xas_prev() and xas_next() to search the page array. This should be more efficient as it does not have to start the lookup from the top for each index. Signed-off-by: Matthew Wilcox <willy@infradead.org>
| * | | | xarray: Define struct xa_nodeMatthew Wilcox2018-10-211-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a direct replacement for struct radix_tree_node. A couple of struct members have changed name, so convert those. Use a #define so that radix tree users continue to work without change. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com>
| * | | | xarray: Replace exceptional entriesMatthew Wilcox2018-09-2910-29/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com>
* | | | | hugetlbfs: dirty pages as they are added to pagecacheMike Kravetz2018-10-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some test systems were experiencing negative huge page reserve counts and incorrect file block counts. This was traced to /proc/sys/vm/drop_caches removing clean pages from hugetlbfs file pagecaches. When non-hugetlbfs explicit code removes the pages, the appropriate accounting is not performed. This can be recreated as follows: fallocate -l 2M /dev/hugepages/foo echo 1 > /proc/sys/vm/drop_caches fallocate -l 2M /dev/hugepages/foo grep -i huge /proc/meminfo AnonHugePages: 0 kB ShmemHugePages: 0 kB HugePages_Total: 2048 HugePages_Free: 2047 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 4194304 kB ls -lsh /dev/hugepages/foo 4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo To address this issue, dirty pages as they are added to pagecache. This can easily be reproduced with fallocate as shown above. Read faulted pages will eventually end up being marked dirty. But there is a window where they are clean and could be impacted by code such as drop_caches. So, just dirty them all as they are added to the pagecache. Link: http://lkml.kernel.org/r/b5be45b8-5afe-56cd-9482-28384699a049@oracle.com Fixes: 6bda666a03f0 ("hugepages: fold find_or_alloc_pages into huge_no_page()") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Mihcla Hocko <mhocko@suse.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | mm: export add_swap_extent()Omar Sandoval2018-10-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs currently does not support swap files because swap's use of bmap does not work with copy-on-write and multiple devices. See 35054394c4b3 ("Btrfs: stop providing a bmap operation to avoid swapfile corruptions"). However, the swap code has a mechanism for the filesystem to manually add swap extents using add_swap_extent() from the ->swap_activate() aop. iomap has done this since 67482129cdab ("iomap: add a swapfile activation function"). Btrfs will do the same in a later patch, so export add_swap_extent(). Link: http://lkml.kernel.org/r/bb1208575e02829aae51b538709476964f97b1ea.1536704650.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Sterba <dsterba@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | mm: split SWP_FILE into SWP_ACTIVATED and SWP_FSOmar Sandoval2018-10-262-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SWP_FILE flag serves two purposes: to make swap_{read,write}page() go through the filesystem, and to make swapoff() call ->swap_deactivate(). For Btrfs, we want the latter but not the former, so split this flag into two. This makes us always call ->swap_deactivate() if ->swap_activate() succeeded, not just if it didn't add any swap extents itself. This also resolves the issue of the very misleading name of SWP_FILE, which is only used for swap files over NFS. Link: http://lkml.kernel.org/r/6d63d8668c4287a4f6d203d65696e96f80abdfc7.1536704650.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>