Publication | Closed Access
Machine Learning for Imbalanced Datasets: Application in Medical Diagnostic
58
Citations
9
References
2006
Year
EngineeringMachine LearningDiagnosisRoc CurveClassification MethodData ScienceData MiningPattern RecognitionClass ImbalanceBiostatisticsPublic HealthImbalanced DatasetsPredictive AnalyticsKnowledge DiscoveryIntelligent ClassificationEpidemiologyData ClassificationClassificationClassifier SystemImbalanced Class DistributionHealth Informatics
In this paper, we present a new rule induction algorithm for machine learning in medical diagnosis. Medical datasets, as many other real-world datasets, exhibit an imbalanced class distribution. However, this is not the only problem to solve for this kind of datasets, we must also consider other problems besides the poor classification accuracy caused by the classes distribution. Therefore, we propose a different strategy based on the maximization of the classification accuracy of the minority class as opposed to the usually used sampling and cost techniques. Our experimental results were conducted using an original dataset for cardiovascular diseases diagnostic and three public datasets. The experiments are performed using standard classifiers (Naive Bayes, C4.5 and k-Nearest Neighbor), emergent classifiers (Neural Networks and Support Vector Machines) and other classifiers used for imbalanced datasets (Ripper and Random Forest). In all the tests, our algorithm showed competitive results in terms of accuracy and area under the ROC curve, but overcomes the other classifiers in terms of comprehensibility and validity.
| Year | Citations | |
|---|---|---|
Page 1
Page 1