Adding semantics to microblog posts

TLDR

Microblogs are a valuable real‑time source of information for marketing, intelligence, and reputation management, yet identifying their content is difficult because of creative, informal, and concise language. The study aims to determine a microblog post’s topic by automatically linking it to semantically related Wikipedia concepts. The authors develop a machine‑learning method that automatically extracts semantically related concepts and links them to Wikipedia articles, enabling downstream social‑media mining. On a tweet test set, the proposed machine‑learning approach outperforms existing semantic‑linking methods, achieving markedly higher precision.

Abstract

Microblogs have become an important source of information for the purpose of marketing, intelligence, and reputation management. Streams of microblogs are of great value because of their direct and real-time nature. Determining what an individual microblog post is about, however, can be non-trivial because of creative language usage, the highly contextualized and informal nature of microblog posts, and the limited length of this form of communication. We propose a solution to the problem of determining what a microblog post is about through semantic linking: we add semantics to posts by automatically identifying concepts that are semantically related to it and generating links to the corresponding Wikipedia articles. The identified concepts can subsequently be used for, e.g., social media mining, thereby reducing the need for manual inspection and selection. Using a purpose-built test collection of tweets, we show that recently proposed approaches for semantic linking do not perform well, mainly due to the idiosyncratic nature of microblog posts. We propose a novel method based on machine learning with a set of innovative features and show that it is able to achieve significant improvements over all other methods, especially in terms of precision.

References

Page 1

	Year	Citations

Page 1