summaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
Commit message (Collapse)AuthorAgeFilesLines
...
* drm/amdgpu:fix random missing of FLR NOTIFYMonk Liu2017-12-041-3/+11
| | | | | | Signed-off-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu:implement new GPU recover(v3)Monk Liu2017-12-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1,new imple names amdgpu_gpu_recover which gives more hint on what it does compared with gpu_reset 2,gpu_recover unify bare-metal and SR-IOV, only the asic reset part is implemented differently 3,gpu_recover will increase hang job karma and mark its entity/context as guilty if exceeds limit V2: 4,in scheduler main routine the job from guilty context will be immedialy fake signaled after it poped from queue and its fence be set with "-ECANCELED" error 5,in scheduler recovery routine all jobs from the guilty entity would be dropped 6,in run_job() routine the real IB submission would be skipped if @skip parameter equales true or there was VRAM lost occured. V3: 7,replace deprecated gpu reset, use new gpu recover Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/virt: implement wait_reset callbacks for vi/aipding2017-12-041-0/+1
| | | | | | | Reviewed-by: Monk Liu <monk.liu@amd.com> Signed-off-by: pding <Pixel.Ding@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: SR-IOV data exchange between PF&VFHorace Chen2017-10-191-0/+6
| | | | | | | | | | | | | | | | | SR-IOV need to exchange some data between PF&VF through shared VRAM PF will copy some necessary firmware and information to the shared VRAM. It also requires some information from VF. PF will send a key through mailbox2 to help guest calculate checksum so that it can verify whether the data is correct. So check the data on the specified offset of the shared VRAM, if the checksum is right, read values from it and write some VF information next to the data from PF. Signed-off-by: Horace Chen <horace.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu: Support passing amdgpu critical error to host via GPU Mailbox.Gavin Wan2017-07-141-20/+26
| | | | | | | | | | | | | | This feature works for SRIOV enviroment. For non-SRIOV enviroment, the trans_error function does nothing. The error information includes error_code (16bit), error_flags(16bit) and error_data(64bit). Since there are not many errors, we keep the errors in an array and transfer all errors to Host before amdgpu initialization function (amdgpu_device_init) exit. Signed-off-by: Gavin Wan <Gavin.Wan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu:only call flr_work under infinite timeoutMonk Liu2017-05-241-6/+9
| | | | | | Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu:use job* to replace voluntaryMonk Liu2017-05-241-1/+1
| | | | | | | | | | that way we can know which job cause hang and can do per sched reset/recovery instead of all sched. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/virt: change AI ack-irq message to debug levelXiangliang Yu2017-05-241-1/+1
| | | | | | | | Change message to debug level as VI does. Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Reviewed-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu:need som change on vega10 mailboxMonk Liu2017-05-241-8/+10
| | | | | | | | | | | | | | | | | | | | | if sriov gpu reset is invoked by job timeout, it is run in a global work-queue which is very slow and better not call msleep ortherwise it takes long time to get back CPU. so make below changes: 1: Change msleep 1 to mdelay 5 2: Ignore the ack fail from pf after time out, because VF FLR will clear ack, sometime VF FLR is done prior to the beginning of poll_ack so we can ignore this ack TODO: Put job_timedout (and the following gpu reset) in a driver thread, instead of the global work_struct. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu:fix cannot receive rcv/ack irq bugMonk Liu2017-05-241-2/+2
| | | | | | Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu:implement the reset MB func for vega10Monk Liu2017-04-061-0/+133
| | | | | | | | | they are lack in the bringup stage, we need them for GPU reset feature. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu:fix typo for mxgpu_aiMonk Liu2017-04-061-2/+2
| | | | | | Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
* drm/amdgpu/virt: impl mailbox for aiXiangliang Yu2017-03-291-0/+207
Implement mailbox protocol for AI so that guest vf can communicate with GPU hypervisor. Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Monk Liu <Monk.Liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>