Cross-domain sentiment classification via spectral feature alignment

TLDR

Sentiment classification predicts polarity from text, but manual labeling is costly and domain differences cause poor transfer performance. The authors aim to create a general cross‑domain sentiment classifier that works without target‑domain labels by leveraging labeled source data. They propose spectral feature alignment, which co‑clusters domain‑specific and domain‑independent words into unified clusters in a latent space to bridge domain gaps and train accurate target classifiers. Experiments on two real‑world datasets show that SFA markedly outperforms prior cross‑domain sentiment methods.

Abstract

Sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of users publishing sentiment data (e.g., reviews, blogs). Although traditional classification algorithms can be used to train sentiment classifiers from manually labeled text data, the labeling work can be time-consuming and expensive. Meanwhile, users often use some different words when they express sentiment in different domains. If we directly apply a classifier trained in one domain to other domains, the performance will be very low due to the differences between these domains. In this work, we develop a general solution to sentiment classification when we do not have any labels in a target domain but have some labeled data in a different domain, regarded as source domain. In this cross-domain sentiment classification setting, to bridge the gap between the domains, we propose a spectral feature alignment (SFA) algorithm to align domain-specific words from different domains into unified clusters, with the help of domain-independent words as a bridge. In this way, the clusters can be used to reduce the gap between domain-specific words of the two domains, which can be used to train sentiment classifiers in the target domain accurately. Compared to previous approaches, SFA can discover a robust representation for cross-domain data by fully exploiting the relationship between the domain-specific and domain-independent words via simultaneously co-clustering them in a common latent space. We perform extensive experiments on two real world datasets, and demonstrate that SFA significantly outperforms previous approaches to cross-domain sentiment classification.

References

Page 1

	Year	Citations

Page 1