Publication | Closed Access
Large-Scale Parallel Statistical Forecasting Computations in R
10
Citations
17
References
2011
Year
Unknown Venue
Cluster ComputingEngineeringForecasting ApplicationMap-reduceMapreduce ParadigmParallel Computational InfrastructureData ScienceData-intensive PlatformManagementStatistical ComputingData IntegrationParallel ComputingData ManagementHigh-performance Data AnalyticsMassively-parallel ComputingPredictive AnalyticsComputer ScienceForecastingData-intensive ComputingParallel ProcessingCloud ComputingParallel ProgrammingData-level ParallelismMassive Data ProcessingBig Data
We demonstrate the utility of massively parallel computational infrastructure for statistical computing using the MapReduce paradigm for R. This framework allows users to write computations in a high-level language that are then broken up and distributed to worker tasks in Google datacenters. Results are collected in a scalable, distributed data store and returned to the interactive user session. We apply our approach to a forecasting application that fits a variety of models, prohibiting an analytical description of the statistical uncertainty associated with the overall forecast. To overcome this, we generate simulation-based uncertainty bands, which necessitates a large number of computationally intensive realizations. Our technique cut total run time by a factor of 300. Distributing the computation across many machines permits analysts to focus on statistical issues while answering questions that would be intractable without significant parallel computational infrastructure. We present real-world performance characteristics from our application to allow practitioners to better understand the nature of massively parallel statistical simulations in R.
| Year | Citations | |
|---|---|---|
Page 1
Page 1