Parendi: Thouasand-way Parallel RTL Simulation

Parendi

Hardware development relies on simulation, particularly cycle-accurate RTL (Register Transfer Level) simulation, which consume significant time.

As single-processor performance grows only slowly, conventional, single-threaded RTL simulation is becoming less practical for increasingly complex chips and systems. A solution is parallel RTL simulation; ideally, simulators could run thousands of parallel cores. However, existing simulators can only exploit tens of cores.
Read more  ↩︎

Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism

Manticore

Manticore1 is an attempt to co-design hardware and software to enable high-performance parallel RTL simulation. Manticore is a direct predecessor to Parendi. While Parendi attempts to study parallel simulation on a multi-thousand-core machine, Manticore studies parallel RTL simulation on a 225-core machine that we designed and built solely for accelerated RTL simulation.

Read more  ↩︎

Auto-partitioning Heterogeneous Task-parallel Programs with StreamBlocks

StreamBlocks

FPGAs are notoriously difficult to program. Whether you describe your logic in RTL or high-level programming language, you still need a fair amount of software glue code to get it to run. One prominent issue is that you may not benefit from FPGAs. Most FPGAs operate at 200–300 MHz range, but an x86 desktop or server processor runs at 3 GHz+. Therefore, to know whether something runs well on an FPGA, you need to be an expert in both (multi-threaded) software and hardware design.

Read more  ↩︎