Publication | Open Access
A flexible high-performance simulator for verifying and benchmarking quantum circuits implemented on real hardware
147
Citations
50
References
2019
Year
The authors present qFlex, a flexible tensor‑network quantum circuit simulator designed for random quantum circuits of supremacy‑scale sizes, and introduce a technique to remove rejection‑sampling overhead and a novel multithreaded cache‑efficient tensor‑index permutation algorithm. qFlex computes both exact and low‑fidelity amplitudes for quantum circuits, benchmarks square‑lattice circuits and Google’s Bristlecone QPU, and employs a multithreaded cache‑efficient tensor‑index permutation algorithm. Simulations show that fidelity‑scaled runs cost 1/ f of perfect‑fidelity ones, and on NASA’s Pleiades and Electra clusters the most demanding run achieved 20 PFLOPS (64 % of peak), the largest sustained FLOP performance ever on those systems.
Abstract Here we present qFlex, a flexible tensor network-based quantum circuit simulator. qFlex can compute both the exact amplitudes, essential for the verification of the quantum hardware, as well as low-fidelity amplitudes, to mimic sampling from Noisy Intermediate-Scale Quantum (NISQ) devices. In this work, we focus on random quantum circuits (RQCs) in the range of sizes expected for supremacy experiments. Fidelity f simulations are performed at a cost that is 1/ f lower than perfect fidelity ones. We also present a technique to eliminate the overhead introduced by rejection sampling in most tensor network approaches. We benchmark the simulation of square lattices and Google’s Bristlecone QPU. Our analysis is supported by extensive simulations on NASA HPC clusters Pleiades and Electra. For our most computationally demanding simulation, the two clusters combined reached a peak of 20 Peta Floating Point Operations per Second (PFLOPS) (single precision), i.e., 64% of their maximum achievable performance, which represents the largest numerical computation in terms of sustained FLOPs and the number of nodes utilized ever run on NASA HPC clusters. Finally, we introduce a novel multithreaded, cache-efficient tensor index permutation algorithm of general application.
| Year | Citations | |
|---|---|---|
Page 1
Page 1