UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch

Most busy waits (spinlocks) in "UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c" already call CpuPause() in their loop bodies; see SmmWaitForApArrival(), APHandler(), and SmiRendezvous(). However, the "main wait" within APHandler(): > // > // Wait for something to happen > // > WaitForSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run); doesn't do so, as WaitForSemaphore() keeps trying to acquire the semaphore without pausing. The performance impact is especially notable in QEMU/KVM + OVMF virtualization with CPU overcommit (that is, when the guest has significantly more VCPUs than the host has physical CPUs). The guest BSP is working heavily in: BSPHandler() [MpService.c] PerformRemainingTasks() [PiSmmCpuDxeSmm.c] SetUefiMemMapAttributes() [SmmCpuMemoryManagement.c] while the many guest APs are spinning in the "Wait for something to happen" semaphore acquisition, in APHandler(). The guest APs are generating useless memory traffic and saturating host CPUs, hindering the guest BSP's progress in SetUefiMemMapAttributes(). Rework the loop in WaitForSemaphore(): call CpuPause() in every iteration after the first check fails. Due to Pause Loop Exiting (known as Pause Filter on AMD), the host scheduler can favor the guest BSP over the guest APs. Running a 16 GB RAM + 512 VCPU guest on a 448 PCPU host, this patch reduces OVMF boot time (counted until reaching grub) from 20-30 minutes to less than 4 minutes. The patch should benefit physical machines as well -- according to the Intel SDM, PAUSE "Improves the performance of spin-wait loops". Adding PAUSE to the generic WaitForSemaphore() function is considered a general improvement. Cc: Eric Dong <eric.dong@intel.com> Cc: Philippe Mathieu-Daudé <philmd@redhat.com> Cc: Rahul Kumar <rahul1.kumar@intel.com> Cc: Ray Ni <ray.ni@intel.com> Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1861718 Signed-off-by: Laszlo Ersek <lersek@redhat.com> Message-Id: <20200729185217.10084-1-lersek@redhat.com> Reviewed-by: Eric Dong <eric.dong@intel.com>
author: Laszlo Ersek <lersek@redhat.com> 2020-07-29 20:52:17 +0200
committer: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> 2020-07-31 13:27:50 +0000
commit: 9001b750df64b25b14ec45a2efa1361a7b96c00a (patch)
tree: 981956d3bc29edc0e567b60397a4f5ee41143cbe /UefiCpuPkg/PiSmmCpuDxeSmm
parent: 656419f922c047a3c48bd3f4ecea7d8e87d0b761 (diff)
download: edk2-9001b750df64b25b14ec45a2efa1361a7b96c00a.tar.gz
edk2-9001b750df64b25b14ec45a2efa1361a7b96c00a.tar.bz2
edk2-9001b750df64b25b14ec45a2efa1361a7b96c00a.zip
1 files changed, 11 insertions, 7 deletions
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
index 57e788c01b..4bcd217917 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
@@ -40,14 +40,18 @@ WaitForSemaphore (
 {
   UINT32                            Value;
 
-  do {
+  for (;;) {
     Value = *Sem;
-  } while (Value == 0 ||
-           InterlockedCompareExchange32 (
-             (UINT32*)Sem,
-             Value,
-             Value - 1
-             ) != Value);
+    if (Value != 0 &&
+        InterlockedCompareExchange32 (
+          (UINT32*)Sem,
+          Value,
+          Value - 1
+          ) == Value) {
+      break;
+    }
+    CpuPause ();
+  }
   return Value - 1;
 }
author	Laszlo Ersek <lersek@redhat.com>	2020-07-29 20:52:17 +0200
committer	mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2020-07-31 13:27:50 +0000
commit	9001b750df64b25b14ec45a2efa1361a7b96c00a (patch)
tree	981956d3bc29edc0e567b60397a4f5ee41143cbe /UefiCpuPkg/PiSmmCpuDxeSmm
parent	656419f922c047a3c48bd3f4ecea7d8e87d0b761 (diff)
download	edk2-9001b750df64b25b14ec45a2efa1361a7b96c00a.tar.gz edk2-9001b750df64b25b14ec45a2efa1361a7b96c00a.tar.bz2 edk2-9001b750df64b25b14ec45a2efa1361a7b96c00a.zip