Publication | Closed Access
The effect of LUT and cluster size on deep-submicron FPGA performance and density
411
Citations
25
References
2004
Year
Cluster ComputingEngineeringHardware AlgorithmComputer ArchitectureHardware SecurityHigh-performance ArchitectureParallel ComputingCluster SizeLogic Block ArchitecturesDeep-submicron Fpga PerformanceElectrical EngineeringComputer EngineeringComputer ScienceMicroelectronicsFpga DesignFpga PerformanceLogic SynthesisHardware AccelerationVlsi ArchitectureParallel ProgrammingLogic DensityField-programmable Gate Arrays
FPGA performance and density depend on logic block design, especially the size of lookup tables and cluster configurations in island‑style architectures. This paper investigates how LUT size and cluster size affect FPGA speed and logic density. The authors synthesize benchmark circuits into various cluster‑based architectures using a fully timing‑driven experimental flow to evaluate these effects. Experiments show that small LUTs (2–3) improve area but halve performance, while LUTs of 5–6 yield better area, and that a LUT size of 4–6 with 3–10 LUTs per cluster gives the optimal area‑delay product.
In this paper, we revisit the field-programmable gate-array (FPGA) architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs (Betz et al. 1997) we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA. We use a fully timing-driven experimental flow (Betz et al. 1997), (Marquardt, 1999) in which a set of benchmark circuits are synthesized into different cluster-based (Betz and Rose, 1997, 1998) and (Marquardt, 1999) logic block architectures, which contain groups of LUTs and flip-flops. Across all architectures with LUT sizes in the range of 2 to 7 inputs, and cluster size from 1 to 10 LUTs, we have experimentally determined the relationship between the number of inputs required for a cluster as a function of the LUT size (K) and cluster size (N). Second, contrary to previous results, we have shown that clustering small LUTs (sizes 2 and 3) produces better area results than what was presented in the past. However, our results also show that the performance of FPGAs with these small LUT sizes is significantly worse (by almost a factor of 2) than larger LUTs. Hence, as measured by area-delay product, or by performance, these would be a bad choice. Also, we have discovered that LUT sizes of 5 and 6 produce much better area results than were previously believed. Finally, our results show that a LUT size of 4 to 6 and cluster size of between 3-10 provides the best area-delay product for an FPGA.
| Year | Citations | |
|---|---|---|
Page 1
Page 1