Auto-partitioning Heterogeneous Task-parallel Programs with StreamBlocks

FPGAs are notoriously difficult to program. Whether you describe your logic in RTL or high-level programming language, you still need a fair amount of software glue code to get it to run. One prominent issue is that you may not benefit from FPGAs. Most FPGAs operate at 200–300 MHz range, but an x86 desktop or server processor runs at 3 GHz+. Therefore, to know whether something runs well on an FPGA, you need to be an expert in both (multi-threaded) software and hardware design.
StreamBlocks is a unified compiler that attempts to makes this situation better. StreamBlocks’ philosophy is that you should write your code once, and then a compiler should assist you in dividing your work between multicore CPUs and FPGAs.
A single-language system with an appropriate programming model and compiler that targets both platforms transforms this tedious exploration to a simple recompile with new compiler directives.
StreamBlocks is augmented with a profile-guided auto-partitioning tool that helps identify the best hardware-software partitions. We demonstrate the capability of our compiler in finding the right balance between hardware and software execution on both a high-end datacenter accelerator card and an embedded board. Our experiments exhibit a 4 — 7× speedup over trivial partitions. This speedup is achieved automatically with zero code modifications.
Read more about StreamBlocks below and access the code via github:
Publication
Mahyar Emami*, Endri Bezati*, Jörn W. Janneck, and James R. Larus. 2023. “Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks“. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT ’22) — paper, slides, code,* Equal contributor