Publication | Closed Access
Analysis and Optimization of the Implicit Broadcasts in FPGA HLS to Improve Maximum Frequency
12
Citations
0
References
2020
Year
Unknown Venue
EngineeringComputer ArchitectureHls DesignsElectromagnetic CompatibilityHigh-performance ArchitectureTiming AnalysisSystems EngineeringFpga HlsHardware Description LanguageRealistic Hls DesignsParallel ComputingUltra-low LatencyElectrical EngineeringHigh-frequency DeviceMaximum FrequencyComputer EngineeringComputer ScienceFpga DesignSignal ProcessingImplicit BroadcastsHls Delay Model
Designs generated by high-level synthesis (HLS) tools typically achieve a lower frequency compared to manual RTL designs. We study the timing issues in a diverse set of nine realistic HLS designs and observe that in most cases the frequency degradation is related to the signal broadcast structures. In this work, we classify the common broadcast types in HLS designs, including the data signal broadcast and two types of control signal broadcast: the pipeline control broadcast and the synchronization signal broadcast. We further identify several common limitations of the current HLS tools, which lead to improper handling of the broadcasts. First, the HLS delay model does not consider the extra delay caused by broadcasts, thus the scheduling results will be suboptimal. To solve the issue, we implement a set of comprehensive synthetic designs and benchmark the extra delay to calibrate the HLS delay model. Second, the HLS adopts back-pressure signals for pipeline control, which will lead to large broadcasts. Instead, we propose to use the skid-buffer-based pipeline control, where the back-pressure signal is removed, and an extra skid-buffer is used for flow-control. We use dynamic programming to minimize the area of the extra FIFO. Third, there exist redundant synchronizations among concurrent modules that may lead to huge broadcasts. We propose methods to identify and prune unnecessary synchronization signals. Our solutions boost the frequency of nine real-world HLS benchmarks by 53% on average and with marginal area and latency overhead. In some cases, the gain is more than 100 MHz.