Publication | Closed Access
Modeling and Optimization for Big Data Analytics: (Statistical) learning tools for our era of data deluge
242
Citations
66
References
2014
Year
Cluster ComputingEngineeringMachine LearningBig Data AnalyticsLarge VolumesUnsupervised Machine LearningBig Data ModelData ScienceData MiningPattern RecognitionLearning ToolsManagementData DelugeMultilinear Subspace LearningPrincipal Component AnalysisData ManagementStatisticsData ModelingPredictive AnalyticsKnowledge DiscoveryComputer ScienceDimensionality ReductionDeep LearningBig Data AcquisitionSparse RepresentationMassive Data ProcessingBig Data
With pervasive sensors continuously collecting and storing massive amounts of information, there is no doubt this is an era of data deluge. Learning from these large volumes of data is expected to bring significant science and engineering advances along with improvements in quality of life. However, with such a big blessing come big challenges. Running analytics on voluminous data sets by central processors and storage units seems infeasible, and with the advent of streaming data sources, learning must often be performed in real time, typically without a chance to revisit past entries. Workhorse signal processing (SP) and statistical learning tools have to be re-examined in todays high-dimensional data regimes. This article contributes to the ongoing cross-disciplinary efforts in data science by putting forth encompassing models capturing a wide range of SP-relevant data analytic tasks, such as principal component analysis (PCA), dictionary learning (DL), compressive sampling (CS), and subspace clustering. It offers scalable architectures and optimization algorithms for decentralized and online learning problems, while revealing fundamental insights into the various analytic and implementation tradeoffs involved. Extensions of the encompassing models to timely data-sketching, tensor- and kernel-based learning tasks are also provided. Finally, the close connections of the presented framework with several big data tasks, such as network visualization, decentralized and dynamic estimation, prediction, and imputation of network link load traffic, as well as imputation in tensor-based medical imaging are highlighted.
| Year | Citations | |
|---|---|---|
Page 1
Page 1