Sentiment Analysis of Bengali Text using CountVectorizer with Logistic Regression

Abstract

Sentiment analysis refers to find the polarity of a text. Sentiment analysis or opinion mining on social media data is an emerging topic in research. In Bangladesh, there remains a few researches on Bengali text. In Natural Language Processing, text document can be represented by vector which is a very efficient way to find the similarities among the text. Word2vec is a technique to represent each word as a vector and Doc2vec represent a whole sentence or document as a vector. CountVectorizer is another technique to transform a corpora of text to a vector form along with the token counts. There remains a few researches using Word2vec and Doc2vec. And recent studies showed that CountVectorizer outperforms both of the Doc2vec and Word2vec techniques. In this study, we have used a dataset of 7,000 Bengali text to apply sentiment classification into two classes: positive and negative. We have extracted the feature using CountVectorizer and then have applied the machine learning method: Logistic Regression for the classification task. The result shows higher accuracy than the previous works.

References

Page 1

	Year	Citations

Page 1