Publication | Closed Access
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
65
Citations
38
References
2007
Year
EngineeringComputer ArchitectureNetwork AnalysisVliw ProcessorsInterconnection Network ArchitectureEmbedded SystemsProcessor ArchitectureCommunication ArchitectureClustered Vliw ProcessorsHardware ArchitectureHardware SecurityIntercluster Communication MechanismsHigh-performance ArchitectureSystems EngineeringParallel ComputingManycore ProcessorComputer EngineeringInterconnection NetworkComputer ScienceClustered Vliw ArchitecturesMany-core ArchitectureNetwork IntegrationSystem SoftwarePerformance Evaluation Methodology
VLIW processors have started gaining acceptance in the embedded systems domain. However, monolithic register file VLIW processors with a large number of functional units are not viable. This is because of the need for a large number of ports to support FU requirements, which makes them expensive and extremely slow. A simple solution is to break the register file into a number of smaller register files with a subset of FUs connected to it. These architectures are termed clustered VLIW processors . In this article, we first build a case for clustered VLIW processors with four or more clusters by showing that the achievable ILP in most of the media applications for a 16 ALU and 8 LD/ST VLIW processor is around 20. We then provide a classification of the intercluster interconnection design space, and show that a large part of this design space is currently unexplored. Next, using our performance evaluation methodology, we evaluate a subset of this design space and show that the most commonly used type of interconnection, RF-to-RF, fails to meet achievable performance by a large factor, while certain other types of interconnections can lower this gap considerably. We also establish that this behavior is heavily application dependent, emphasizing the importance of application-specific architecture exploration. We also present results about the statistical behavior of these different architectures by varying the number of clusters in our framework from 4 to 16. These results clearly show the advantages of one specific architecture over others. Finally, based on our results, we propose a new interconnection network, which should lower this performance gap.
| Year | Citations | |
|---|---|---|
Page 1
Page 1