Publication | Open Access
Learning from imbalanced data: open challenges and future directions
2.3K
Citations
55
References
2016
Year
EngineeringMachine LearningBig Data AnalyticsOpen ChallengesImbalanced LearningContinuous DevelopmentText MiningData ScienceData MiningPattern RecognitionClass ImbalanceStatisticsPredictive AnalyticsKnowledge DiscoveryIntelligent ClassificationComputer ScienceData ClassificationData Stream MiningBig Data
Learning from imbalanced data has evolved over two decades from binary skewed distributions to encompass big data, hybrid methods, and new challenges in real‑time, adaptive contexts. The paper aims to outline open issues and challenges that must be addressed to advance the field of imbalanced learning. It identifies seven key research areas—classification, regression, clustering, data streams, big data analytics, and applications such as social media and computer vision—and offers discussion and suggestions for future work in each.
Despite more than two decades of continuous development learning from imbalanced data is still a focus of intense research. Starting as a problem of skewed distributions of binary tasks, this topic evolved way beyond this conception. With the expansion of machine learning and data mining, combined with the arrival of big data era, we have gained a deeper insight into the nature of imbalanced learning, while at the same time facing new emerging challenges. Data-level and algorithm-level methods are constantly being improved and hybrid approaches gain increasing popularity. Recent trends focus on analyzing not only the disproportion between classes, but also other difficulties embedded in the nature of data. New real-life problems motivate researchers to focus on computationally efficient, adaptive and real-time methods. This paper aims at discussing open issues and challenges that need to be addressed to further develop the field of imbalanced learning. Seven vital areas of research in this topic are identified, covering the full spectrum of learning from imbalanced data: classification, regression, clustering, data streams, big data analytics and applications, e.g., in social media and computer vision. This paper provides a discussion and suggestions concerning lines of future research for each of them.
| Year | Citations | |
|---|---|---|
Page 1
Page 1