Adversarial classification

TLDR

Data mining algorithms typically assume independence from the data miner, yet in adversarial domains such as spam, intrusion, fraud, surveillance, and counter‑terrorism attackers manipulate data to force classifiers into false negatives, causing rapid performance degradation after deployment. The authors develop a formal framework and algorithms to counter adversarial manipulation of classifiers. They formulate classification as a game between the classifier and the adversary, computing a classifier optimal against the adversary’s best strategy to replace manual reconstruction. Experiments on spam detection show that this game‑theoretic classifier outperforms standard learning and automatically adapts to evolving adversarial manipulations.

Abstract

Essentially all data mining algorithms assume that the data-generating process is independent of the data miner's activities. However, in many domains, including spam detection, intrusion detection, fraud detection, surveillance and counter-terrorism, this is far from the case: the data is actively manipulated by an adversary seeking to make the classifier produce false negatives. In these domains, the performance of a classifier can degrade rapidly after it is deployed, as the adversary learns to defeat it. Currently the only solution to this is repeated, manual, ad hoc reconstruction of the classifier. In this paper we develop a formal framework and algorithms for this problem. We view classification as a game between the classifier and the adversary, and produce a classifier that is optimal given the adversary's optimal strategy. Experiments in a spam detection domain show that this approach can greatly outperform a classifier learned in the standard way, and (within the parameters of the problem) automatically adapt the classifier to the adversary's evolving manipulations.

References

Page 1

	Year	Citations

Page 1