summaryrefslogtreecommitdiffstats
path: root/mm/hmm.c
Commit message (Collapse)AuthorAgeFilesLines
...
* mm/hmm: hmm_range_fault() NULL pointer bugRalph Campbell2019-08-271-4/+9
| | | | | | | | | | | | | | | | | | | | | | | Although hmm_range_fault() calls find_vma() to make sure that a vma exists before calling walk_page_range(), hmm_vma_walk_hole() can still be called with walk->vma == NULL if the start and end address are not contained within the vma range. hmm_range_fault() /* calls find_vma() but no range check */ walk_page_range() /* calls find_vma(), sets walk->vma = NULL */ __walk_page_range() walk_pgd_range() walk_p4d_range() walk_pud_range() hmm_vma_walk_hole() hmm_vma_walk_hole_() hmm_vma_do_fault() handle_mm_fault(vma=0) Link: https://lore.kernel.org/r/20190823221753.2514-2-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: fix hmm_range_fault()'s handling of swapped out pagesYang, Philip2019-08-231-0/+3
| | | | | | | | | | | | | | | | hmm_range_fault() may return NULL pages because some of the pfns are equal to HMM_PFN_NONE. This happens randomly under memory pressure. The reason is during the swapped out page pte path, hmm_vma_handle_pte() doesn't update the fault variable from cpu_flags, so it failed to call hmm_vam_do_fault() to swap the page in. The fix is to call hmm_pte_need_fault() to update fault variable. Fixes: 74eee180b935 ("mm/hmm/mirror: device page fault handler") Link: https://lore.kernel.org/r/20190815205227.7949-1-Philip.Yang@amd.com Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: "Jérôme Glisse" <jglisse@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* hmm: use mmu_notifier_get/put for 'struct hmm'Jason Gunthorpe2019-08-201-93/+28
| | | | | | | | | | | | | | | | | | | | | | This is a significant simplification, it eliminates all the remaining 'hmm' stuff in mm_struct, eliminates krefing along the critical notifier paths, and takes away all the ugly locking and abuse of page_table_lock. mmu_notifier_get() provides the single struct hmm per struct mm which eliminates mm->hmm. It also directly guarantees that no mmu_notifier op callback is callable while concurrent free is possible, this eliminates all the krefs inside the mmu_notifier callbacks. The remaining krefs in the range code were overly cautious, drivers are already not permitted to free the mirror while a range exists. Link: https://lore.kernel.org/r/20190806231548.25242-6-jgg@ziepe.ca Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Tested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: cleanup the hmm_vma_walk_hugetlb_entry stubChristoph Hellwig2019-08-071-4/+4
| | | | | | | | | | | Stub out the whole function and assign NULL to the .hugetlb_entry method if CONFIG_HUGETLB_PAGE is not set, as the method won't ever be called in that case. Link: https://lore.kernel.org/r/20190806160554.14046-13-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: cleanup the hmm_vma_handle_pmd stubChristoph Hellwig2019-08-071-10/+8
| | | | | | | | | | Stub out the whole function when CONFIG_TRANSPARENT_HUGEPAGE is not set to make the function easier to read. Link: https://lore.kernel.org/r/20190806160554.14046-12-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: only define hmm_vma_walk_pud if neededChristoph Hellwig2019-08-071-13/+16
| | | | | | | | | | | | | We only need the special pud_entry walker if PUD-sized hugepages and pte mappings are supported, else the common pagewalk code will take care of the iteration. Not implementing this callback reduced the amount of code compiled for non-x86 platforms, and also fixes compile failures with other architectures when helpers like pud_pfn are not implemented. Link: https://lore.kernel.org/r/20190806160554.14046-11-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: don't abuse pte_index() in hmm_vma_handle_pmdChristoph Hellwig2019-08-071-1/+1
| | | | | | | | | | | pte_index is an internal arch helper in various architectures, without consistent semantics. Open code that calculation of a PMD index based on the virtual address instead. Link: https://lore.kernel.org/r/20190806160554.14046-10-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: remove the mask variable in hmm_vma_walk_hugetlb_entryChristoph Hellwig2019-08-071-5/+2
| | | | | | | | | The pagewalk code already passes the value as the hmask parameter. Link: https://lore.kernel.org/r/20190806160554.14046-9-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: remove the page_shift member from struct hmm_rangeChristoph Hellwig2019-08-071-33/+9
| | | | | | | | | | | | | All users pass PAGE_SIZE here, and if we wanted to support single entries for huge pages we should really just add a HMM_FAULT_HUGEPAGE flag instead that uses the huge page size instead of having the caller calculate that size once, just for the hmm code to verify it. Link: https://lore.kernel.org/r/20190806160554.14046-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: remove superfluous arguments from hmm_range_registerChristoph Hellwig2019-08-071-15/+5
| | | | | | | | | | | The start, end and page_shift values are all saved in the range structure, so we might as well use that for argument passing. Link: https://lore.kernel.org/r/20190806160554.14046-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: remove the unused vma argument to hmm_range_dma_unmapChristoph Hellwig2019-08-071-2/+0
| | | | | | | Link: https://lore.kernel.org/r/20190806160554.14046-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: remove hmm_range vmaRalph Campbell2019-07-261-1/+0
| | | | | | | | | | | | Since hmm_range_fault() doesn't use the struct hmm_range vma field, remove it. Link: https://lore.kernel.org/r/20190726005650.2566-8-rcampbell@nvidia.com Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: remove hugetlbfs check in hmm_vma_walk_pmdRalph Campbell2019-07-261-3/+0
| | | | | | | | | | | | | | walk_page_range() will only call hmm_vma_walk_hugetlb_entry() for hugetlbfs pages and doesn't call hmm_vma_walk_pmd() in this case. Therefore, it is safe to remove the check for vma->vm_flags & VM_HUGETLB in hmm_vma_walk_pmd(). Link: https://lore.kernel.org/r/20190726005650.2566-7-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: merge hmm_range_snapshot into hmm_range_faultChristoph Hellwig2019-07-261-83/+2
| | | | | | | | | | | Add a HMM_FAULT_SNAPSHOT flag so that hmm_range_snapshot can be merged into the almost identical hmm_range_fault function. Link: https://lore.kernel.org/r/20190726005650.2566-5-rcampbell@nvidia.com Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: replace the block argument to hmm_range_fault with a flags valueChristoph Hellwig2019-07-261-37/+37
| | | | | | | | | | | This allows easier expansion to other flags, and also makes the callers a little easier to read. Link: https://lore.kernel.org/r/20190726005650.2566-4-rcampbell@nvidia.com Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: a few more C style and comment clean upsRalph Campbell2019-07-261-22/+17
| | | | | | | | | | | A few more comments and minor programming style clean ups. There should be no functional changes. Link: https://lore.kernel.org/r/20190726005650.2566-3-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: replace hmm_update with mmu_notifier_rangeRalph Campbell2019-07-261-9/+4
| | | | | | | | | | | | | | The hmm_mirror_ops callback function sync_cpu_device_pagetables() passes a struct hmm_update which is a simplified version of struct mmu_notifier_range. This is unnecessary so replace hmm_update with mmu_notifier_range directly. Link: https://lore.kernel.org/r/20190726005650.2566-2-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> [jgg: white space tuning] Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* mm/hmm: comment on VM_FAULT_RETRY semantics in handle_mm_faultJason Gunthorpe2019-07-251-1/+3
| | | | | | | | | | | The magic dropping of mmap_sem when handle_mm_fault returns VM_FAULT_RETRY is rather subtile. Add a comment explaining it. Link: https://lore.kernel.org/r/20190724065258.16603-8-hch@lst.de Tested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> [hch: wrote a changelog] Signed-off-by: Christoph Hellwig <hch@lst.de>
* mm/hmm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}Christoph Hellwig2019-07-251-6/+4
| | | | | | | | | | | | | | | | | We should not have two different error codes for the same condition. EAGAIN must be reserved for the FAULT_FLAG_ALLOW_RETRY retry case and signals to the caller that the mmap_sem has been unlocked. Use EBUSY for the !valid case so that callers can get the locking right. Link: https://lore.kernel.org/r/20190724065258.16603-2-hch@lst.de Tested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> [jgg: elaborated commit message] Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* Merge tag 'for-linus-hmm' of ↵Linus Torvalds2019-07-141-456/+131
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull HMM updates from Jason Gunthorpe: "Improvements and bug fixes for the hmm interface in the kernel: - Improve clarity, locking and APIs related to the 'hmm mirror' feature merged last cycle. In linux-next we now see AMDGPU and nouveau to be using this API. - Remove old or transitional hmm APIs. These are hold overs from the past with no users, or APIs that existed only to manage cross tree conflicts. There are still a few more of these cleanups that didn't make the merge window cut off. - Improve some core mm APIs: - export alloc_pages_vma() for driver use - refactor into devm_request_free_mem_region() to manage DEVICE_PRIVATE resource reservations - refactor duplicative driver code into the core dev_pagemap struct - Remove hmm wrappers of improved core mm APIs, instead have drivers use the simplified API directly - Remove DEVICE_PUBLIC - Simplify the kconfig flow for the hmm users and core code" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (42 commits) mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR mm: remove the HMM config option mm: sort out the DEVICE_PRIVATE Kconfig mess mm: simplify ZONE_DEVICE page private data mm: remove hmm_devmem_add mm: remove hmm_vma_alloc_locked_page nouveau: use devm_memremap_pages directly nouveau: use alloc_page_vma directly PCI/P2PDMA: use the dev_pagemap internal refcount device-dax: use the dev_pagemap internal refcount memremap: provide an optional internal refcount in struct dev_pagemap memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag memremap: remove the data field in struct dev_pagemap memremap: add a migrate_to_ram method to struct dev_pagemap_ops memremap: lift the devmap_enable manipulation into devm_memremap_pages memremap: pass a struct dev_pagemap to ->kill and ->cleanup memremap: move dev_pagemap callbacks into a separate structure memremap: validate the pagemap type passed to devm_memremap_pages mm: factor out a devm_request_free_mem_region helper mm: export alloc_pages_vma ...
| * Merge branch 'hmm-devmem-cleanup.4' into rdma.git hmmJason Gunthorpe2019-07-021-284/+0
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christoph Hellwig says: ==================== Below is a series that cleans up the dev_pagemap interface so that it is more easily usable, which removes the need to wrap it in hmm and thus allowing to kill a lot of code Changes since v3: - pull in "mm/swap: Fix release_pages() when releasing devmap pages" and rebase the other patches on top of that - fold the hmm_devmem_add_resource into the DEVICE_PUBLIC memory removal patch - remove _vm_normal_page as it isn't needed without DEVICE_PUBLIC memory - pick up various ACKs Changes since v2: - fix nvdimm kunit build - add a new memory type for device dax - fix a few issues in intermediate patches that didn't show up in the end result - incorporate feedback from Michal Hocko, including killing of the DEVICE_PUBLIC memory type entirely Changes since v1: - rebase - also switch p2pdma to the internal refcount - add type checking for pgmap->type - rename the migrate method to migrate_to_ram - cleanup the altmap_valid flag - various tidbits from the reviews ==================== Conflicts resolved by: - Keeping Ira's version of the code in swap.c - Using the delete for the section in hmm.rst - Using the delete for the devmap code in hmm.c and .h * branch 'hmm-devmem-cleanup.4': (24 commits) mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR mm: remove the HMM config option mm: sort out the DEVICE_PRIVATE Kconfig mess mm: simplify ZONE_DEVICE page private data mm: remove hmm_devmem_add mm: remove hmm_vma_alloc_locked_page nouveau: use devm_memremap_pages directly nouveau: use alloc_page_vma directly PCI/P2PDMA: use the dev_pagemap internal refcount device-dax: use the dev_pagemap internal refcount memremap: provide an optional internal refcount in struct dev_pagemap memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag memremap: remove the data field in struct dev_pagemap memremap: add a migrate_to_ram method to struct dev_pagemap_ops memremap: lift the devmap_enable manipulation into devm_memremap_pages memremap: pass a struct dev_pagemap to ->kill and ->cleanup memremap: move dev_pagemap callbacks into a separate structure memremap: validate the pagemap type passed to devm_memremap_pages mm: factor out a devm_request_free_mem_region helper mm: export alloc_pages_vma ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * mm: remove the HMM config optionChristoph Hellwig2019-07-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All the mm/hmm.c code is better keyed off HMM_MIRROR. Also let nouveau depend on it instead of the mix of a dummy dependency symbol plus the actually selected one. Drop various odd dependencies, as the code is pretty portable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * mm: remove hmm_devmem_addChristoph Hellwig2019-07-021-110/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There isn't really much value add in the hmm_devmem_add wrapper and more, as using devm_memremap_pages directly now is just as simple. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * mm: remove hmm_vma_alloc_locked_pageChristoph Hellwig2019-07-021-14/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only user of it has just been removed, and there wasn't really any need to wrap a basic memory allocator to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flagChristoph Hellwig2019-07-021-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a flags field to struct dev_pagemap to replace the altmap_valid boolean to be a little more extensible. Also add a pgmap_altmap() helper to find the optional altmap and clean up the code using the altmap using it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * memremap: remove the data field in struct dev_pagemapChristoph Hellwig2019-07-021-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct dev_pagemap is always embedded into a containing structure, so there is no need to an additional private data field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * memremap: add a migrate_to_ram method to struct dev_pagemap_opsChristoph Hellwig2019-07-021-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the hacky ->fault callback, which is currently directly called from common code through a hmm specific data structure as an exercise in layering violations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * memremap: lift the devmap_enable manipulation into devm_memremap_pagesChristoph Hellwig2019-07-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just check if there is a ->page_free operation set and take care of the static key enable, as well as the put using device managed resources. Also check that a ->page_free is provided for the pgmaps types that require it, and check for a valid type as well while we are at it. Note that this also fixes the fact that hmm never called dev_pagemap_put_ops and thus would leave the slow path enabled forever, even after a device driver unload or disable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * memremap: pass a struct dev_pagemap to ->kill and ->cleanupChristoph Hellwig2019-07-021-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Passing the actual typed structure leads to more understandable code vs just passing the ref member. Reported-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * memremap: move dev_pagemap callbacks into a separate structureChristoph Hellwig2019-07-021-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dev_pagemap is a growing too many callbacks. Move them into a separate ops structure so that they are not duplicated for multiple instances, and an attacker can't easily overwrite them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * mm: factor out a devm_request_free_mem_region helperChristoph Hellwig2019-07-021-29/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep the physical address allocation that hmm_add_device does with the rest of the resource code, and allow future reuse of it without the hmm wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * mm: don't clear ->mapping in hmm_devmem_freeChristoph Hellwig2019-07-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->mapping isn't even used by HMM users, and the field at the same offset in the zone_device part of the union is declared as pad. (Which btw is rather confusing, as DAX uses ->pgmap and ->mapping from two different sides of the union, but DAX doesn't use hmm_devmem_free). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * mm: remove MEMORY_DEVICE_PUBLIC supportChristoph Hellwig2019-07-021-52/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code hasn't been used since it was added to the tree, and doesn't appear to actually be usable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * mm: remove the struct hmm_device infrastructureChristoph Hellwig2019-07-021-80/+0
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | This code is a trivial wrapper around device model helpers, which should have been integrated into the driver device model usage from the start. Assuming it actually had users, which it never had since the code was added more than 1 1/2 years ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * Merge tag 'v5.2-rc7' into rdma.git hmmJason Gunthorpe2019-07-021-11/+3
| |\ | |/ |/| | | Required for dependencies in the next patches.
| * mm/hmm: Fix error flows in hmm_invalidate_range_startJason Gunthorpe2019-06-271-29/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the trylock on the hmm->mirrors_sem fails the function will return without decrementing the notifiers that were previously incremented. Since the caller will not call invalidate_range_end() on EAGAIN this will result in notifiers becoming permanently incremented and deadlock. If the sync_cpu_device_pagetables() required blocking the function will not return EAGAIN even though the device continues to touch the pages. This is a violation of the mmu notifier contract. Switch, and rename, the ranges_lock to a spin lock so we can reliably obtain it without blocking during error unwind. The error unwind is necessary since the notifiers count must be held incremented across the call to sync_cpu_device_pagetables() as we cannot allow the range to become marked valid by a parallel invalidate_start/end() pair while doing sync_cpu_device_pagetables(). Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
| * mm/hmm: Remove confusing comment and logic from hmm_releaseJason Gunthorpe2019-06-241-19/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hmm_release() is called exactly once per hmm. ops->release() cannot accidentally trigger any action that would recurse back onto hmm->mirrors_sem. This fixes a use after-free race of the form: CPU0 CPU1 hmm_release() up_write(&hmm->mirrors_sem); hmm_mirror_unregister(mirror) down_write(&hmm->mirrors_sem); up_write(&hmm->mirrors_sem); kfree(mirror) mirror->ops->release(mirror) The only user we have today for ops->release is an empty function, so this is unambiguously safe. As a consequence of plugging this race drivers are not allowed to register/unregister mirrors from within a release op. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
| * mm/hmm: Poison hmm_range during unregisterJason Gunthorpe2019-06-241-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Trying to misuse a range outside its lifetime is a kernel bug. Use poison bytes to help detect this condition. Double unregister will reliably crash. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Philip Yang <Philip.Yang@amd.com>
| * mm/hmm: Remove racy protection against double-unregistrationJason Gunthorpe2019-06-241-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No other register/unregister kernel API attempts to provide this kind of protection as it is inherently racy, so just drop it. Callers should provide their own protection, and it appears nouveau already does. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
| * mm/hmm: Use lockdep instead of commentsJason Gunthorpe2019-06-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | So we can check locking at runtime. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Acked-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
| * mm/hmm: Hold on to the mmget for the lifetime of the rangeJason Gunthorpe2019-06-181-21/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Range functions like hmm_range_snapshot() and hmm_range_fault() call find_vma, which requires hodling the mmget() and the mmap_sem for the mm. Make this simpler for the callers by holding the mmget() inside the range for the lifetime of the range. Other functions that accept a range should only be called if the range is registered. This has the side effect of directly preventing hmm_release() from happening while a range is registered. That means range->dead cannot be false during the lifetime of the range, so remove dead and hmm_mirror_mm_is_alive() entirely. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
| * mm/hmm: Do not use list*_rcu() for hmm->rangesJason Gunthorpe2019-06-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This list is always read and written while holding hmm->lock so there is no need for the confusing _rcu annotations. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <iweiny@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
| * mm/hmm: Simplify hmm_get_or_create and make it reliableJason Gunthorpe2019-06-181-47/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As coded this function can false-fail in various racy situations. Make it reliable and simpler by running under the write side of the mmap_sem and avoiding the false-failing compare/exchange pattern. Due to the mmap_sem this no longer has to avoid racing with a 2nd parallel hmm_get_or_create(). Unfortunately this still has to use the page_table_lock as the non-sleeping lock protecting mm->hmm, since the contexts where we free the hmm are incompatible with mmap_sem. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
| * mm/hmm: Hold a mmgrab from hmm to mmJason Gunthorpe2019-06-101-18/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So long as a struct hmm pointer exists, so should the struct mm it is linked too. Hold the mmgrab() as soon as a hmm is created, and mmdrop() it once the hmm refcount goes to zero. Since mmdrop() (ie a 0 kref on struct mm) is now impossible with a !NULL mm->hmm delete the hmm_hmm_destroy(). Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
| * mm/hmm: Use hmm_mirror not mm as an argument for hmm_range_registerJason Gunthorpe2019-06-101-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ralph observes that hmm_range_register() can only be called by a driver while a mirror is registered. Make this clear in the API by passing in the mirror structure as a parameter. This also simplifies understanding the lifetime model for struct hmm, as the hmm pointer must be valid as part of a registered mirror so all we need in hmm_register_range() is a simple kref_get. Suggested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
| * mm/hmm: fix use after free with struct hmm in the mmu notifiersJason Gunthorpe2019-06-071-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mmu_notifier_unregister_no_release() is not a fence and the mmu_notifier system will continue to reference hmm->mn until the srcu grace period expires. Resulting in use after free races like this: CPU0 CPU1 __mmu_notifier_invalidate_range_start() srcu_read_lock hlist_for_each () // mn == hmm->mn hmm_mirror_unregister() hmm_put() hmm_free() mmu_notifier_unregister_no_release() hlist_del_init_rcu(hmm-mn->list) mn->ops->invalidate_range_start(mn, range); mm_get_hmm() mm->hmm = NULL; kfree(hmm) mutex_lock(&hmm->lock); Use SRCU to kfree the hmm memory so that the notifiers can rely on hmm existing. Get the now-safe hmm struct through container_of and directly check kref_get_unless_zero to lock it against free. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Philip Yang <Philip.Yang@amd.com>
| * mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blockingKuehling, Felix2019-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | Don't set this flag by default in hmm_vma_do_fault. It is set conditionally just a few lines below. Setting it unconditionally can lead to handle_mm_fault doing a non-blocking fault, returning -EBUSY and unlocking mmap_sem unexpectedly. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * mm/hmm: support automatic NUMA balancingPhilip Yang2019-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While the page is migrating by NUMA balancing, HMM failed to detect this condition and still return the old page. Application will use the new page migrated, but driver pass the old page physical address to GPU, this crash the application later. Use pte_protnone(pte) to return this condition and then hmm_vma_do_fault will allocate new page. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * mm/hmm: clean up some coding style and commentsRalph Campbell2019-06-061-30/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are no functional changes, just some coding style clean ups and minor comment changes. Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * mm/hmm.c: suppress compilation warnings when CONFIG_HUGETLB_PAGE is not setJason Gunthorpe2019-06-061-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc reports that several variables are defined but not used. For the first hunk CONFIG_HUGETLB_PAGE the entire if block is already protected by pud_huge() which is forced to 0. None of the stuff under the ifdef causes compilation problems as it is already stubbed out in the header files. For the second hunk the dummy huge_page_shift macro doesn't touch the argument, so just inline the argument. Link: http://lkml.kernel.org/r/20190522195151.GA23955@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>