Concepedia

TLDR

Large‑scale supercomputing simulations drive modern scientific discovery, yet data analysis is often performed offline or on smaller clusters, leading to performance and energy inefficiencies from excessive data movement between compute and storage subsystems. The authors propose Active Flash, an in‑situ data analysis approach that processes data directly on the solid‑state device where it resides. Active Flash executes analysis on the SSD controller, integrating computation with storage to eliminate the need to move data to separate compute nodes. Performance and energy models indicate that Active Flash can mitigate these inefficiencies without harming simulation performance, and a prototype built on a commercial SSD controller confirms its feasibility.

Abstract

Modern scientific discovery is increasingly driven by large-scale supercomputing simulations, followed by data analysis tasks. These data analyses are either performed offline, on smaller-scale clusters, or on the supercomputer itself. Unfortunately, these techniques suffer from performance and energy inefficiencies due to increased data movement between the compute and storage subsystems. Therefore, we propose Active Flash, an insitu scientific data analysis approach, wherein data analysis is conducted on the solid-state device (SSD), where the data already resides. Our performance and energy models show that Active Flash has the potential to address many of the aforementioned concerns without degrading HPC simulation performance. In addition, we demonstrate an Active Flash prototype built on a commercial SSD controller, which further reaffirms the viability of our proposal.

References

YearCitations

Page 1