Concepedia

Abstract

Hand-crafted features engineering is a labor-intensive and highly-cost task. In this paper, we implement Word2Vec as an alternative solution of hand-crafted features for sentiment analysis of hotel reviews in the Indonesian language. To obtain the highest performance of sentiment analysis, we evaluate three parameters of Word2Vec include Word2Vec model architecture, evaluation method, and vector dimension. This evaluation process was implemented towards our proposed corpus for a specific domain, i.e. hotel reviews, consists of 2500 hotel reviews in the Indonesian language (1250 positive reviews and 1250 negative reviews). The result shows that the highest accuracy values are obtained under the combination of the following parameters, namely the architecture of Word2Vec Model is Skip-gram model, the evaluation method is Hierarchical Softmax, as well as the vector dimension is 100. The Skip-gram model results highest accuracy for words that rarely appear, such as in sentiment analysis task, whereas the Hierarchical Softmax provides better results since during the training process using a binary tree model to represent all of the words in the vocabulary and leaf nodes representing rare words so that rarely appearing words will inherit vector representations in it. Furthermore, to obtain the optimal value of accuracy, then we should increase the vector dimensions and amount of data simultaneously.

References

YearCitations

Page 1