Parendi

Hardware development relies on simulation, particularly cycle-accurate RTL (Register Transfer Level) simulation, which consume significant time.

As single-processor performance grows only slowly, conventional, single-threaded RTL simulation is becoming less practical for increasingly complex chips and systems. A solution is parallel RTL simulation; ideally, simulators could run thousands of parallel cores. However, existing simulators can only exploit tens of cores.

We study the challenges inherent in running parallel RTL simulation on a multi-thousand-core machine (the Graphcore IPU, a 1472-core machine). Simulation performance requires balancing three factors: synchronization, communication, and computation. We experimentally evaluate each metric and analyze how it affects parallel simulation performance, drawing on the contrast between the large-scale IPU and smaller but faster x86 systems.

We build Parendi1(#footnote1), an RTL simulator for the IPU using this analysis. It distributes the RTL simulation across 5888 cores on 4 IPU sockets.

Publication

Mahyar Emami, Thomas Bourgeat, James R. Larus. “Parendi: Thousand-Way Parallel RTL Simulation“. ASPLOS’25 — paper, slides, code.

1 Parendi (or Pārendi) is the female Zoroastrian angel (i.e., êzaḏ, ایزد in Persian) of abundance. It's very likely related (the same as) to the Vedic godess Purandhi.