Publication | Closed Access
One seed to find them all
74
Citations
28
References
2012
Year
Unknown Venue
EngineeringMultimodal Sentiment AnalysisCorpus LinguisticsSentiment AnalysisText MiningWord EmbeddingsNatural Language ProcessingInformation RetrievalData ScienceFeature-based Opinion AnalysisComputational LinguisticsCrop EstablishmentLanguage StudiesContent AnalysisCombinatorial EvolutionBootstrapping FrameworkKnowledge DiscoveryOpinion Feature ExtractionEvolutionary BiologyRelationship ExtractionKeyword ExtractionCross-fertilizationOpinion Aggregation
Feature-based opinion analysis has attracted extensive attention recently. Identifying features associated with opinions expressed in reviews is essential for fine-grained opinion mining. One approach is to exploit the dependency relations that occur naturally between features and opinion words, and among features (or opinion words) themselves. In this paper, we propose a generalized approach to opinion feature extraction by incorporating robust statistical association analysis in a bootstrapping framework. The new approach starts with a small set of feature seeds, on which it iteratively enlarges by mining feature-opinion, feature-feature, and opinion-opinion dependency relations. Two association model types, namely likelihood ratio tests (LRT) and latent semantic analysis (LSA), are proposed for computing the pair-wise associations between terms (features or opinions). We accordingly propose two robust bootstrapping approaches, LRTBOOT and LSABOOT, both of which need just a handful of initial feature seeds to bootstrap opinion feature extraction. We benchmarked LRTBOOT and LSABOOT against existing approaches on a large number of real-life reviews crawled from the cellphone and hotel domains. Experimental results using varying number of feature seeds show that the proposed association-based bootstrapping approach significantly outperforms the competitors. In fact, one seed feature is all that is needed for LRTBOOT to significantly outperform the other methods. This seed feature can simply be the domain feature, e.g., "cellphone" or "hotel". The consequence of our discovery is far reaching: starting with just one feature seed, typically just the domain concept word, LRTBOOT can automatically extract a large set of high-quality opinion features from the corpus without any supervision or labeled features. This means that the automatic creation of a set of domain features is no longer a pipe dream!
| Year | Citations | |
|---|---|---|
Page 1
Page 1