summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
diff options
context:
space:
mode:
authorGuchun Chen <guchun.chen@amd.com>2020-04-16 23:41:07 +0800
committerAlex Deucher <alexander.deucher@amd.com>2020-04-22 18:11:46 -0400
commit12c17b9d62663c14a5343d6742682b3e67280754 (patch)
treea636bfd72445599432b2165a2d59c205606cacb5 /drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
parent69d0c18dda2dae4cfbf73d4ffaa5aff6dc69894a (diff)
downloadlinux-stable-12c17b9d62663c14a5343d6742682b3e67280754.tar.gz
linux-stable-12c17b9d62663c14a5343d6742682b3e67280754.tar.bz2
linux-stable-12c17b9d62663c14a5343d6742682b3e67280754.zip
drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU
When running ras uncorrectable error injection and triggering GPU reset on sGPU, below issue is observed. It's caused by the list uninitialized when accessing. [ 80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750 [ 80.047300] #PF: supervisor write access in kernel mode [ 80.047351] #PF: error_code(0x0003) - permissions violation [ 80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061 [ 80.047477] Oops: 0003 [#1] SMP PTI [ 80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G OE 5.4.0-rc7-guchchen #1 [ 80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018 [ 80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu] Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c')
-rw-r--r--drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c5
1 files changed, 3 insertions, 2 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 73ae913aee26..68b82f7b0b80 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1453,9 +1453,10 @@ static void amdgpu_ras_do_recovery(struct work_struct *work)
struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev, false);
/* Build list of devices to query RAS related errors */
- if (hive && adev->gmc.xgmi.num_physical_nodes > 1) {
+ if (hive && adev->gmc.xgmi.num_physical_nodes > 1)
device_list_handle = &hive->device_list;
- } else {
+ else {
+ INIT_LIST_HEAD(&device_list);
list_add_tail(&adev->gmc.xgmi.head, &device_list);
device_list_handle = &device_list;
}