Publication | Open Access
Empath
319
Citations
45
References
2016
Year
Unknown Venue
EngineeringSeed TermsCorpus LinguisticsPresent EmpathText MiningWord EmbeddingsNatural Language ProcessingComputational Social ScienceData ScienceSeed WordsComputational LinguisticsLanguage EngineeringAffective ComputingLanguage StudiesContent AnalysisMachine TranslationNlp TaskKnowledge DiscoveryDistributional SemanticsLinguistics
Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.
| Year | Citations | |
|---|---|---|
Page 1
Page 1