Publication | Closed Access
Numerically stable, single-pass, parallel statistics algorithms
52
Citations
6
References
2009
Year
Unknown Venue
Cluster ComputingEngineeringParallel ImplementationStatistical AnalysisParallel AnalysisData ScienceParallel Complexity TheoryScalability StudyParallel ComputingPrincipal Component AnalysisStatisticsMassively-parallel ComputingMultidimensional AnalysisComputer ScienceFunctional Data AnalysisComputational ScienceParallel ProcessingParallel ProgrammingParallel ScalabilityParallel Statistics
Statistical analysis is widely used for countless scientific applications in order to analyze and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. In this paper we derive a series of formulas that allow for single-pass, yet numerically robust, pairwise parallel and incremental updates of both arbitrary-order centered statistical moments and co-moments. Using these formulas, we have built an open source parallel statistics framework that performs principal component analysis (PCA) in addition to computing descriptive, correlative, and multi-correlative statistics. The results of a scalability study demonstrate numerically stable, near-optimal scalability on up to 128 processes and results are presented in which the statistical framework is used to process large-scale turbulent combustion simulation data with 1500 processes.
| Year | Citations | |
|---|---|---|
Page 1
Page 1