Publication | Closed Access
Handling Imbalanced Data Sets in Insurance Risk Modeling
28
Citations
10
References
2000
Year
Unknown Venue
EngineeringMachine LearningRisk Model ValidationInsurance RisksInsurance Risk ModelingData ScienceData MiningClass ImbalanceDecision TreeRisk ManagementManagementInsurance RiskDecision Tree LearningInsuranceStatisticsQuantitative ManagementPrediction ModellingPredictive AnalyticsKnowledge DiscoveryRiskCasualty InsuranceStatistical Learning TheoryRisk Analysis (Business)
As owners of cars, homes, and other property, consumers buy property and casualty insurance to protect themselves against the unexpected: i.e., accidents, fire, theft, etc. Such events occur very rarely at the level of individual policyholders. Data sets constructed for the purpose of insurance risk modeling are therefore highly imbalanced. In any given time period, most policyholders file no claims, a small percentage file one claim, and an even smaller percentage file two or more claims. This paper presents some of the tree-based learning techniques we have developed to model insurance risks. Two important aspects of our approach that distinguish it from other tree-based methods are that it incorporates a split-selection criterion tailored to the specific statistical characteristics of insurance data, and it uses constraints on the statistical accuracies of model parameter estimates to guide the construction of splits in order to overcome selection biases that arise because of the imbalance that is present in the data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1