Concepedia

TLDR

While financial data analysis for fraud detection is well studied, text analysis of annual reports remains underexplored, and this study posits that linguistic clues can reveal fraud likelihood. The study aims to identify linguistic features that differentiate fraudulent from nonfraudulent annual reports using NLP. The authors applied NLP techniques to analyze the verbal content and presentation style of annual reports, extracting linguistic features to distinguish fraud. Linguistic features significantly improved fraud detection, raising accuracy from 56.75 % to 89.51 %.

Abstract

ABSTRACT: Extensive research has been done on the analytical and empirical examination of financial data in annual reports to detect fraud; however, there is scant research on the analysis of text in annual reports to detect fraud. The basic premise of this research is that there are clues hidden in the text that can be detected to determine the likelihood of fraud. In this research, we examine both the verbal content and the presentation style of the qualitative portion of the annual reports using natural language processing tools and explore linguistic features that distinguish fraudulent annual reports from nonfraudulent annual reports. Our results indicate that employment of linguistic features is an effective means for detecting fraud. We were able to improve the prediction accuracy of our fraud detection model from initial baseline results of 56.75 percent accuracy, using a “bag of words” approach, to 89.51 percent accuracy when we incorporated linguistically motivated features inspired by our informed reasoning and domain knowledge.

References

YearCitations

Page 1