Parendi: Thouasand-way Parallel RTL Simulation

Hardware development relies on simulation, particularly cycle-accurate RTL (Register Transfer Level) simulation, which consume significant time.
We study the challenges inherent in running parallel RTL simulation on a multi-thousand-core machine (the Graphcore IPU, a 1472-core machine). Simulation performance requires balancing three factors: synchronization, communication, and computation. We experimentally evaluate each metric and analyze how it affects parallel simulation performance, drawing on the contrast between the large-scale IPU and smaller but faster x86 systems.
We build Parendi1(#footnote1), an RTL simulator for the IPU using this analysis. It distributes the RTL simulation across 5888 cores on 4 IPU sockets.
Publication
Mahyar Emami, Thomas Bourgeat, James R. Larus. “Parendi: Thousand-Way Parallel RTL Simulation“. ASPLOS’25 — paper, slides, code.
1 Parendi (or Pārendi) is the female Zoroastrian angel (i.e., êzaḏ, ایزد in Persian) of abundance. It's very likely related (the same as) to the Vedic godess Purandhi.