Acta Scientiarum Naturalium Universitatis Pekinensis

Previous Articles     Next Articles

A Comprehensive Study of Executing ahead Mechanism for In-Order Microprocessors

WANG Xiaoyin, TONG Dong, DANG Xianglei, LU Junlin, CHENG Xu   

  1. Microprocessor Research and Development Center, Peking University, Beijing 100871;
  • Received:2010-01-18 Online:2011-01-20 Published:2011-01-20



  1. 北京大学微处理器研究开发中心,北京 100871;

Abstract: The authors explore the design space of in-order executing ahead processors, and conduct sensitivity analysis of the executing ahead mechanism to the cache hierarchy and memory latency. It is demonstrated that reusing the pre-executed results is highly effective in improving performance and reducing energy consumption. The results also show that propagating valid data values between stores and dependent loads with a small store cache increases performance significantly. An in-order executing ahead processor with a 32-entry store cache and a 128-entry FIFO for preserving and reusing results increases performance by 24.07% over the baseline processor, with an energy overhead of 4.93%. Furthermore, it is revealed that executing ahead is necessary for hiding memory access latencies even with a very large cache hierarchy. With increasing memory latency, the performance and energy-efficiency benefits provided by executing ahead are more significant.

Key words: executing ahead, memory latency tolerance , in-order microprocessors

摘要: 面向按序执行处理器开展预执行机制的设计空间探索, 并对预执行机制的优化效果随 Cache 容量和访存延时的变化趋势进行了量化分析。实验结果表明, 对于按序执行处理器, 保存并复用预执行期间的有效结果和在预执行访存指令之间进行数据传递都能够有效地提升处理器性能, 前者还能够有效地降低能耗开销。将两者相结合使用, 在平均情况下将基础处理器的性能提升 24. 07% , 而能耗仅增加 4. 93% 。进一步发现, 在 Cache 容量较大的情况下, 预执行仍然能够带来较大幅度的性能提升。并且, 随着访存延时的增加, 预执行在提高按序执行处理器性能和能效性方面的优势都将更加显著。

关键词: 预执行, 访存延时包容, 按序执行处理器

CLC Number: