From 87229ad9de079cb12ee09a3dc16113c390b729d5 Mon Sep 17 00:00:00 2001
From: Dave Airlie <airlied@redhat.com>
Date: Wed, 19 Sep 2012 11:12:41 +1000
Subject: drm: micro optimise cache flushing

We hit this a lot with i915 and although we'd like to engineer things to hit
it a lot less, this commit at least makes it consume a few less cycles.

from something containing
movzwl 0x0(%rip),%r10d
to
add    %r8,%rdx

I only noticed it while using perf to profile something else.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/drm_cache.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'drivers/gpu/drm/drm_cache.c')

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 08758e061478..3dbc7f17eb11 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -37,12 +37,13 @@ drm_clflush_page(struct page *page)
 {
 	uint8_t *page_virtual;
 	unsigned int i;
+	const int size = boot_cpu_data.x86_clflush_size;
 
 	if (unlikely(page == NULL))
 		return;
 
 	page_virtual = kmap_atomic(page);
-	for (i = 0; i < PAGE_SIZE; i += boot_cpu_data.x86_clflush_size)
+	for (i = 0; i < PAGE_SIZE; i += size)
 		clflush(page_virtual + i);
 	kunmap_atomic(page_virtual);
 }
-- 
cgit v1.2.3


From 9da3da660d8c19a54f6e93361d147509be3fff84 Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri, 1 Jun 2012 15:20:22 +0100
Subject: drm/i915: Replace the array of pages with a scatterlist

Rather than have multiple data structures for describing our page layout
in conjunction with the array of pages, we can migrate all users over to
a scatterlist.

One major advantage, other than unifying the page tracking structures,
this offers is that we replace the vmalloc'ed array (which can be up to
a megabyte in size) with a chain of individual pages which helps reduce
memory pressure.

The disadvantage is that we then do not have a simple array to iterate,
or to access randomly. The common case for this is in the relocation
processing, which will typically fit within a single scatterlist page
and so be almost the same cost as the simple array. For iterating over
the array, the extra function call could be optimised away, but in
reality is an insignificant cost of either binding the pages, or
performing the pwrite/pread.

v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the
trivial compile error from rebasing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/drm_cache.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

(limited to 'drivers/gpu/drm/drm_cache.c')

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 3dbc7f17eb11..4a4274b348b6 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -100,6 +100,31 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 }
 EXPORT_SYMBOL(drm_clflush_pages);
 
+void
+drm_clflush_sg(struct sg_table *st)
+{
+#if defined(CONFIG_X86)
+	if (cpu_has_clflush) {
+		struct scatterlist *sg;
+		int i;
+
+		mb();
+		for_each_sg(st->sgl, sg, st->nents, i)
+			drm_clflush_page(sg_page(sg));
+		mb();
+
+		return;
+	}
+
+	if (on_each_cpu(drm_clflush_ipi_handler, NULL, 1) != 0)
+		printk(KERN_ERR "Timed out waiting for cache flush.\n");
+#else
+	printk(KERN_ERR "Architecture has no drm_cache.c support\n");
+	WARN_ON_ONCE(1);
+#endif
+}
+EXPORT_SYMBOL(drm_clflush_sg);
+
 void
 drm_clflush_virt_range(char *addr, unsigned long length)
 {
-- 
cgit v1.2.3