Release:Releases Archive. Вестник ТюмГУ. Физико-математические науки. Информатика (№7, 2014)
About the authors:Elena G. Brunova, Dr. Sci. (Philol.), Professor, Head of the Department of Foreign Languages and Cross-Cultural Communication in Science, Institute of Mathematics and Computer Sciences, Tyumen State University
Abstract:This study carried out within computational linguistics presents the analysis of the subjective information from user-generated content. The sentiment lexicon (583 items) which is domain-specific (banking) and language-specific (Russian) is built. The sentiment lexicon includes the following classes: positive vocabulary, negative vocabulary, polarity modifiers, anti-modifiers, and increments. The REGEX algorithm with formal grammar elements is proposed. 11 formal grammar rules and the corresponding syntactic models are introduced; they are similar to regular expressions which detect certain text elements, simplify each sentence, and present the text as a formal model. The SENTIMENTO system for evaluating bank service quality is implemented as an Internet application with an interface for the model testing and its adjustment. The efficiency of the proposed algorithm is evaluated in comparison with the efficiency of the Naïve Bayes Classifier, F1 measure is used as the criterion. The system is tested on the reviews published in the clients’ bank rating (www.banki.ru) and the advantage of the proposed algorithm is demonstrated. For the same set of reviews, the F1 value is 0.920 when the proposed method is applied, while it is 0.872 for the Naïve Bayes Classifier.
1. Carenini, G., et al. Extracting Knowledge from Evaluative Text. Proceedings of the 3rd International Conference on Knowledge Capture. 2005. Pp. 11-18.
2. Hu, M., Liu, B. Mining and Summarizing Customer Reviews. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004.
3. Nasukawa, T., Yi J. Sentiment Analysis: Capturing Favorability Using Natural Language Processing. Proceedings of the 2nd International Conference on Knowledge Capture. Florida. 2003. Pp. 70-77.
4. Pang, B., Lee L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarisation Based on Minimum Cuts. Proceedings of the ACL. 2004. Pp. 271-278.
5. Turney, P. Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002. Pp. 417-424.
6. Ermakov, S.A., Ermakova, L.M. Overview of Sentiment Analysis Methods. Vestnik Permskogo universiteta — Perm University Herald. 2012. Issue. 1(19). Pp. 85-89. (in Russian).
7. Lukashevich, N.V., Chetverkin, I.I. Retrieval and Application of Sentiment Lexicon in the Context of Reviews Classifying into Three Classes. Vychislitel'nye metody i programmirovanie — Computational Methods and Programming. 2011. V. 12. Pp. 73-81. (in Russian).
8. Orobinskaja, E.A., Kochueva, Z.A. Text Mining Techniques: Review of Methods and Tasks of Content Processing. Vestnik Hersonskogo nacional'nogo tehnicheskogo universiteta — Herson National Technical University Herald. 2010. № 2 (38). Pp. 348- 353. (in Russian).
9. Pazel'skaja, A.G., Solov'ev, A.N. Method of Emotion Determination in Russian Texts. Komp'juternaja lingvistika i intellektual'nye tehnologii: «Dialog-2011» — Computational Linguistics and Intellectual Technologies. 2011. Issue. 10 (17). Pp. 510-522. (in Russian).
10. Webb, G. et al. Not So Naive Bayes: Aggregating One-Dependence Estimators. Machine Learning. 2005. 58. Pp. 5-24.
11. Hatzivassiloglou, V., McKeown, K. Predicting the Semantic Orientation of Adjectives. Proc. of the 35th Annual Meeting of ACL. Madrid. 1997. Pp. 174-181.12. Manning, Ch., Raghavan, P., Schütze, H. Vvedenie v informacionnyj poisk [Introduction to Information Retrieval]. Moscow, 2011. 520 p. (in Russian).