[PATCH] madvise MADV_DONTFORK/MADV_DOFORK

Currently, copy-on-write may change the physical address of a page even if the user requested that the page is pinned in memory (either by mlock or by get_user_pages). This happens if the process forks meanwhile, and the parent writes to that page. As a result, the page is orphaned: in case of get_user_pages, the application will never see any data hardware DMA's into this page after the COW. In case of mlock'd memory, the parent is not getting the realtime/security benefits of mlock. In particular, this affects the Infiniband modules which do DMA from and into user pages all the time. This patch adds madvise options to control whether memory range is inherited across fork. Useful e.g. for when hardware is doing DMA from/into these pages. Could also be useful to an application wanting to speed up its forks by cutting large areas out of consideration. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
author: Michael S. Tsirkin <mst@mellanox.co.il> 2006-02-14 13:53:08 -0800
committer: Linus Torvalds <torvalds@g5.osdl.org> 2006-02-14 16:09:34 -0800
commit: f822566165dd46ff5de9bf895cfa6c51f53bb0c4 (patch)
tree: e052f406d5a14140d17f76dc8914d33bbc8e5f1d /mm
parent: 8861da31e3b3e3df7b05e7b157230de3d486e53b (diff)
download: linux-f822566165dd46ff5de9bf895cfa6c51f53bb0c4.tar.gz
linux-f822566165dd46ff5de9bf895cfa6c51f53bb0c4.tar.bz2
linux-f822566165dd46ff5de9bf895cfa6c51f53bb0c4.zip
1 files changed, 17 insertions, 4 deletions
diff --git a/mm/madvise.c b/mm/madvise.c
index ae0ae3ea299a..af3d573b0141 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -22,16 +22,23 @@ static long madvise_behavior(struct vm_area_struct * vma,
 	struct mm_struct * mm = vma->vm_mm;
 	int error = 0;
 	pgoff_t pgoff;
-	int new_flags = vma->vm_flags & ~VM_READHINTMASK;
+	int new_flags = vma->vm_flags;
 
 	switch (behavior) {
+	case MADV_NORMAL:
+		new_flags = new_flags & ~VM_RAND_READ & ~VM_SEQ_READ;
+		break;
 	case MADV_SEQUENTIAL:
-		new_flags |= VM_SEQ_READ;
+		new_flags = (new_flags & ~VM_RAND_READ) | VM_SEQ_READ;
 		break;
 	case MADV_RANDOM:
-		new_flags |= VM_RAND_READ;
+		new_flags = (new_flags & ~VM_SEQ_READ) | VM_RAND_READ;
 		break;
-	default:
+	case MADV_DONTFORK:
+		new_flags |= VM_DONTCOPY;
+		break;
+	case MADV_DOFORK:
+		new_flags &= ~VM_DONTCOPY;
 		break;
 	}
 
@@ -177,6 +184,12 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	long error;
 
 	switch (behavior) {
+	case MADV_DOFORK:
+		if (vma->vm_flags & VM_IO) {
+			error = -EINVAL;
+			break;
+		}
+	case MADV_DONTFORK:
 	case MADV_NORMAL:
 	case MADV_SEQUENTIAL:
 	case MADV_RANDOM:
author	Michael S. Tsirkin <mst@mellanox.co.il>	2006-02-14 13:53:08 -0800
committer	Linus Torvalds <torvalds@g5.osdl.org>	2006-02-14 16:09:34 -0800
commit	f822566165dd46ff5de9bf895cfa6c51f53bb0c4 (patch)
tree	e052f406d5a14140d17f76dc8914d33bbc8e5f1d /mm
parent	8861da31e3b3e3df7b05e7b157230de3d486e53b (diff)
download	linux-f822566165dd46ff5de9bf895cfa6c51f53bb0c4.tar.gz linux-f822566165dd46ff5de9bf895cfa6c51f53bb0c4.tar.bz2 linux-f822566165dd46ff5de9bf895cfa6c51f53bb0c4.zip