Publication | Open Access
Big data preprocessing: methods and prospects
671
Citations
77
References
2016
Year
Big Data PreprocessingBig Data AcquisitionData ProcessingEngineeringData ScienceData MiningBusiness IntelligenceMassive GrowthKnowledge DiscoveryManagementData IntegrationBig Data ScenarioComputer ScienceMassive Data ProcessingData ManagementBig Data InfrastructureBig DataBig Data Model
The rapid expansion of data volume, velocity, and variety has made Big Data a key challenge, demanding high‑performance processing and large computational infrastructure for effective analysis. This paper reviews data‑preprocessing methods for Big Data mining, defining their characteristics, categorizing approaches, and examining their integration with current technologies. The authors analyze preprocessing families in the context of Hadoop, Spark, and Flink, discuss research challenges, and advocate for focused study of preprocessing techniques within emerging Big Data learning paradigms.
The massive growth in the scale of data has been observed in recent years being a key factor of the Big Data scenario. Big Data can be defined as high volume, velocity and variety of data that require a new high-performance processing. Addressing big data is a challenging and time-demanding task that requires a large computational infrastructure to ensure successful data processing and analysis. The presence of data preprocessing methods for data mining in big data is reviewed in this paper. The definition, characteristics, and categorization of data preprocessing approaches in big data are introduced. The connection between big data and data preprocessing throughout all families of methods and big data technologies are also examined, including a review of the state-of-the-art. In addition, research challenges are discussed, with focus on developments on different big data framework, such as Hadoop, Spark and Flink and the encouragement in devoting substantial research efforts in some families of data preprocessing methods and applications on new big data learning paradigms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1