A Comprehensive Survey of Data Mining-Based Accounting-Fraud Detection Research

TLDR

Accounting‑fraud detection research has focused on auditor, governance, financial statement, industry, and trading data, with earlier work relying on auditor data and later work incorporating shared and public statement data; ratio data are considered more effective than accounting data, yet time‑series mining remains underexplored, and small fraud sample sizes may inflate model performance estimates. This survey categorizes, compares, and summarizes datasets, algorithms, and performance metrics in automated accounting‑fraud detection, and highlights the need for additional no‑tag data‑mining algorithms. The review examines mining algorithms such as statistical tests, regression analysis, neural networks, decision trees, Bayesian networks, and stacked variables used in the field. Neural networks achieve higher detection accuracy than regression models, and model‑based detection outperforms auditor‑only detection rates.

Abstract

This survey paper categorizes, compares, and summarizes the data set, algorithm and performance measurement in almost all published technical and review articles in automated accounting fraud detection. Most researches regard fraud companies and non-fraud companies as data subjects, Eigenvalue covers auditor data, company governance data, financial statement data, industries, trading data and other categories. Most data in earlier research were auditor data; Later research establish model by using sharing data and public statement data. Company governance data have been widely used. It is generally believed that ratio data is more effective than accounting data; Seldom research on time Series Data Mining were conducted. The retrieved literature used mining algorithms including statistical test, regression analysis, neural networks, decision tree, Bayesian network, and stack variables etc.. Regression Analysis is widely used on hiding data. Generally the detecting effect and accuracy of NN are superior to regression model. General conclusion is that model detecting is better than auditor detecting rate without assisting. There is a need to introduce other algorithms of no-tag data mining. Owing to the small size of fraud samples, some literature reached conclusion based on training samples and may overestimated the effect of model.

References

Page 1

	Year	Citations

Page 1