Online Human-Bot Interactions: Detection, Estimation, and Characterization

TLDR

Social media content is increasingly generated by autonomous social bots. The study presents a framework for detecting social bots on Twitter. The framework uses over a thousand features from user metadata, tweet content, sentiment, network patterns, and activity time series, and is benchmarked on a publicly available Twitter bot dataset enriched with manually annotated human and bot accounts. The models achieve high accuracy, estimate that 9–15% of active Twitter accounts are bots, and reveal distinct bot subclasses and interaction patterns, such as simple bots engaging with more human‑like bots and distinct retweet/mention strategies.

Abstract

Increasing evidence suggests that a growing amount of social media content is generated by autonomous entities known as social bots. In this work we present a framework to detect such entities on Twitter. We leverage more than a thousand features extracted from public data and meta-data about users: friends, tweet content and sentiment, network patterns, and activity time series. We benchmark the classification framework by using a publicly available dataset of Twitter bots. This training data is enriched by a manually annotated collection of active Twitter users that include both humans and bots of varying sophistication. Our models yield high accuracy and agreement with each other and can detect bots of different nature. Our estimates suggest that between 9% and 15% of active Twitter accounts are bots. Characterizing ties among accounts, we observe that simple bots tend to interact with bots that exhibit more human-like behaviors. Analysis of content flows reveals retweet and mention strategies adopted by bots to interact with different target groups. Using clustering analysis, we characterize several subclasses of accounts, including spammers, self promoters, and accounts that post content from connected applications.