<p dir="ltr">In the era of heterogeneous computing, CPUs are responsible for executing workloads that are difficult to accelerate on GPUs or specialized hardware. These workloads contain irregular control flow, featuring abundant data-dependent and hard-to-predict branches. Frequent branch mispredictions in such workloads incur significant performance overhead. In modern deeply-pipelined out-of-order processors, recovering from a misprediction is not an instantaneous process. During this process, instructions following the mispredicted branch, known as wrong-path instructions, are discarded. However, not all wrong-path instructions are data- and control-dependent on the mispredicted branch, and some contain useful work worth salvaging. This motivates the exploration of a mechanism that can exploit the processor's recovery period to execute wrong-path load instructions. We propose Delayed Backend Squash (DBS), which postpones the squash of the processor’s back-end until the end of the recovery period following a branch misprediction, allowing additional wrong-path load instructions already present in the instruction window to execute and access the memory hierarchy. While these load instructions originate from the wrong path, a significant portion of them generate accurate memory accesses and act as useful prefetches for instructions on the correct path. Our evaluation shows that DBS can improve performance by up to 4.9% on a baseline out-of-order processor without hardware prefetchers or architectural state checkpoints. In more advanced setups with stride prefetchers and architectural checkpoints, the maximum speedup drops to 1.74%. We found that while DBS offers a simple in-core mechanism to exploit wrong-path memory accesses, certain limitations restrict its practical effectiveness.</p>
History
Degree Type
Master of Science in Electrical and Computer Engineering