Distributed data mining in credit card fraud detection

TLDR

Credit card transactions are rapidly expanding, increasing stolen account numbers and losses, making scalable, efficient fraud detection essential for the US payment system. The study surveys and evaluates techniques that simultaneously address scalability, efficiency, skewed training data, and unequal error costs in fraud detection. It combines multiple learned fraud detectors under a cost model and applies distributed data mining to build fraud models. Empirical results show that this approach significantly reduces fraud‑related losses.

Abstract

Credit card transactions continue to grow in number, taking an ever-larger share of the US payment system and leading to a higher rate of stolen account numbers and subsequent losses by banks. Improved fraud detection thus has become essential to maintain the viability of the US payment system. Banks have used early fraud warning systems for some years. Large scale data-mining techniques can improve the state of the art in commercial practice. Scalable techniques to analyze massive amounts of transaction data that efficiently compute fraud detectors in a timely manner is an important problem, especially for e-commerce. Besides scalability and efficiency, the fraud-detection task exhibits technical problems that include skewed distributions of training data and nonuniform cost per error, both of which have not been widely studied in the knowledge-discovery and data mining community. In this article, we survey and evaluate a number of techniques that address these three main issues concurrently. Our proposed methods of combining multiple learned fraud detectors under a "cost model" are general and demonstrably useful; our empirical results demonstrate that we can significantly reduce loss due to fraud through distributed data mining of fraud models.

References

Page 1

	Year	Citations

Page 1