diff options
author | Daniel Vetter <daniel.vetter@ffwll.ch> | 2013-05-24 21:29:32 +0200 |
---|---|---|
committer | Daniel Vetter <daniel.vetter@ffwll.ch> | 2013-06-03 14:35:18 +0200 |
commit | 7abb690a0e095717420ba78dcab4309abbbec78a (patch) | |
tree | 32d3a843c5fa4881e78679163448409dc5aa3060 | |
parent | d683b96b072dc4680fc74964eca77e6a23d1fa6e (diff) | |
download | linux-stable-7abb690a0e095717420ba78dcab4309abbbec78a.tar.gz linux-stable-7abb690a0e095717420ba78dcab4309abbbec78a.tar.bz2 linux-stable-7abb690a0e095717420ba78dcab4309abbbec78a.zip |
drm/i915: Fix spurious -EIO/SIGBUS on wedged gpus
Chris Wilson noticed that since
commit 1f83fee08d625f8d0130f9fe5ef7b17c2e022f3c [v3.9]
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Nov 15 17:17:22 2012 +0100
drm/i915: clear up wedged transitions
X can again get -EIO when it does not expect it. And even worse score
a SIGBUS when accessing gtt mmaps. The established ABI is that we
_only_ return an -EIO from execbuf - all other ioctls should just
work. And since the reset code moves all bos out of gpu domains and
clears out all the last_seqno/ring tracking there really shouldn't be
any reason for non-execbuf code to ever touch the hw and see an -EIO.
After some extensive discussions we've noticed that these spurios -EIO
are caused by i915_gem_wait_for_error:
http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg20540.html
That is easy to fix by returning 0 instead of -EIO, since grabbing the
dev->struct_mutex does not yet mean that we actually want to touch the
hw. And so there is no reason at all to fail with -EIO.
But that's not the entire since, since often (at least it's easily
googleable) dmesg indicates that the reset fails and we declare the
gpu wedged. Then, quite a bit later X wakes up with the "Timed out
waiting for the gpu reset to complete" DRM_ERROR message in
wait_for_errror and brings down the desktop with an -EIO/SIGBUS.
So clearly we're missing a wakeup somewhere, since the gpu reset just
doesn't take 10 seconds to complete. And indeed we're do handle the
terminally wedged state wrong.
Fix this all up.
References: https://bugs.freedesktop.org/show_bug.cgi?id=63921
References: https://bugs.freedesktop.org/show_bug.cgi?id=64073
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-rw-r--r-- | drivers/gpu/drm/i915/i915_gem.c | 7 |
1 files changed, 2 insertions, 5 deletions
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index a6cf8e843973..970ad17c99ab 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -91,14 +91,11 @@ i915_gem_wait_for_error(struct i915_gpu_error *error) { int ret; -#define EXIT_COND (!i915_reset_in_progress(error)) +#define EXIT_COND (!i915_reset_in_progress(error) || \ + i915_terminally_wedged(error)) if (EXIT_COND) return 0; - /* GPU is already declared terminally dead, give up. */ - if (i915_terminally_wedged(error)) - return -EIO; - /* * Only wait 10 seconds for the gpu reset to complete to avoid hanging * userspace. If it takes that long something really bad is going on and |