Understanding Performance Differences of FPGAs and GPUs

TLDR

FPGAs achieve more operations per cycle via deep pipelines but suffer from lower off‑chip memory bandwidth, limiting effective parallelism compared to GPUs. The study seeks to clarify performance differences between FPGAs and GPUs. Using the Rodinia benchmark suite, 15 kernels were ported to FPGAs with HLS C and an analytical model was developed to compare performance against GPUs. For 6 of 15 kernels, current FPGAs match or exceed GPU performance while using only 28% of the GPU power, and with fourfold memory bandwidth, 8 kernels are projected to reach at least half the GPU performance.

Abstract

This paper aims to better understand the performance differences between FPGAs and GPUs. We intentionally begin with a widely used GPU-friendly benchmark suite, Rodinia, and port 15 of the kernels onto FPGAs using HLS C. Then we propose an analytical model to compare their performance. We find that for 6 out of the 15 ported kernels, today's FPGAs can provide comparable performance or even achieve better performance than the GPU, while consuming an average of 28% of the GPU power. Besides lower clock frequency, FPGAs usually achieve a higher number of operations per cycle in each customized deep pipeline, but lower effective parallel factor due to the far lower off-chip memory bandwidth. With 4x more memory bandwidth, 8 out of the 15 FPGA kernels are projected to achieve at least half of the GPU kernel performance.

References

Page 1

	Year	Citations

Page 1