Leveraging Financial Social Media Data for Corporate Fraud Detection

TLDR

Corporate fraud causes significant financial losses, erodes investor confidence, and hampers the economy, yet detecting it remains time‑consuming and traditionally relies on financial data and textual analysis of statements. The study proposes an analytic framework grounded in systemic functional linguistics that leverages unstructured data from financial social media platforms to assess corporate fraud risk. Using a balanced sample of 64 fraudulent and 64 nonfraudulent firms, the framework automatically extracts sentiment, emotion, topic, lexical, and social‑network signals from pre‑violation social media, feeds them into machine‑learning classifiers, and evaluates performance against financial‑ratio and language‑based baselines while testing robustness on leaked information, rumors, a new dataset, and an applicability check. Results confirm that financial social media data add value, proving the concept that such data can complement traditional fraud detection methods.

Abstract

Corporate fraud can lead to significant financial losses and cause immeasurable damage to investor confidence and the overall economy. Detection of such frauds is a time-consuming and challenging task. Traditionally, researchers have been relying on financial data and/or textual content from financial statements to detect corporate fraud. Guided by systemic functional linguistics (SFL) theory, we propose an analytic framework that taps into unstructured data from financial social media platforms to assess the risk of corporate fraud. We assemble a unique data set including 64 fraudulent firms and a matched sample of 64 nonfraudulent firms, as well as the social media data prior to the firm’s alleged fraud violation in Accounting and Auditing Enforcement Releases (AAERs). Our framework automatically extracts signals such as sentiment features, emotion features, topic features, lexical features, and social network features, which are then fed into machine learning classifiers for fraud detection. We evaluate and compare the performance of our algorithm against baseline approaches using only financial ratios and language-based features respectively. We further validate the robustness of our algorithm by detecting leaked information and rumors, testing the algorithm on a new data set, and conducting an applicability check. Our results demonstrate the value of financial social media data and serve as a proof of concept of using such data to complement traditional fraud detection methods.

References

Page 1

	Year	Citations

Page 1