Publication | Closed Access
Larrabee
780
Citations
41
References
2008
Year
Parallelism RelativeGpu ArchitectureEngineeringPerformance AnalysisEdge ComputingHigh-performance ArchitectureMany-core ArchitectureComputer ArchitectureComputer EngineeringManycore Programming ModelParallel ProgrammingComputer ScienceParallel ComputingManycore ProcessorGpu Computing
This paper introduces Larrabee, a many‑core visual computing architecture, and details its software rendering pipeline, many‑core programming model, and performance analysis across several applications. Larrabee consists of many in‑order x86 cores augmented by a wide vector unit and fixed‑function logic, a coherent on‑die second‑level cache, software‑controlled task scheduling, and a binning‑based rendering pipeline that reduces memory bandwidth and lock contention, enabling highly parallel applications with irregular data structures. Larrabee achieves markedly higher performance per watt and per area than out‑of‑order CPUs on highly parallel workloads, offers greater flexibility and programmability than standard GPUs, and demonstrates strong performance across a broad range of parallel applications.
This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads. It also greatly increases the flexibility and programmability of the architecture as compared to standard GPUs. A coherent on-die 2 nd level cache allows efficient inter-processor communication and high-bandwidth local data access by CPU cores. Task scheduling is performed entirely with software in Larrabee, rather than in fixed function logic. The customizable software graphics rendering pipeline for this architecture uses binning in order to reduce required memory bandwidth, minimize lock contention, and increase opportunities for parallelism relative to standard GPUs. The Larrabee native programming model supports a variety of highly parallel applications that use irregular data structures. Performance analysis on those applications demonstrates Larrabee's potential for a broad range of parallel computation.
| Year | Citations | |
|---|---|---|
Page 1
Page 1