Concepedia

TLDR

Many‑core microprocessors are projected to reach 10 TFLOPs in the next decade, but memory and inter‑core bandwidth, pin limitations, and global wire non‑scalability pose major obstacles that silicon nanophotonics could overcome. Corona is a 3‑D many‑core architecture that employs nanophotonic communication, with a photonic crossbar interconnecting 256 low‑power multithreaded cores at 20 TB/s and wavelength‑division‑multiplexed optically connected memory modules delivering 10 TB/s, as demonstrated in 1024‑thread synthetic and SPLASH‑2 simulations. Corona achieves a peak 10 TFLOPs and, compared with an electrically‑connected counterpart using the same on‑stack interconnect power, delivers 2–6× higher performance on memory‑intensive workloads while reducing power consumption.

Abstract

We expect that many-core microprocessors will push performance per chip from the 10 gigaflop to the 10 teraflop range in the coming decade. To support this increased performance, memory and inter-core bandwidths will also have to scale by orders of magnitude. Pin limitations, the energy cost of electrical signaling, and the non-scalability of chip-length global wires are significant bandwidth impediments. Recent developments in silicon nanophotonic technology have the potential to meet these off- and on-stack bandwidth requirements at acceptable power levels. Corona is a 3D many-core architecture that uses nanophotonic communication for both inter-core communication and off-stack communication to memory or I/O devices. Its peak floating-point performance is 10 teraflops. Dense wavelength division multiplexed optically connected memory modules provide 10 terabyte per second memory bandwidth. A photonic crossbar fully interconnects its 256 low-power multithreaded cores at 20 terabyte per second bandwidth. We have simulated a 1024 thread Corona system running synthetic benchmarks and scaled versions of the SPLASH-2 benchmark suite. We believe that in comparison with an electrically-connected many-core alternative that uses the same on-stack interconnect power, Corona can provide 2 to 6 times more performance on many memory intensive workloads, while simultaneously reducing power.

References

YearCitations

Page 1