Tyumen State University Herald. Physical and Mathematical Modeling. Oil, Gas, Energy


Releases Archive. Вестник ТюмГУ. Физико-математические науки. Информатика (№7, 2014)

Algorithm with formal grammar elements for sentiment analysis

About the authors:

Elena G. Brunova, Dr. Sci. (Philol.), Professor, Head of the Department of Foreign Languages and Cross-Cultural Communication in Science, Institute of Mathematics and Computer Sciences, Tyumen State University
Yuliya V. Bidulya, Cand. Sci. (Philol.), Associate Professor, Department of Information Systems, University of Tyumen; y.v.bidulya@utmn.ru


This study carried out within computational linguistics presents the analysis of the subjective information from user-generated content. The sentiment lexicon (583 items) which is domain-specific (banking) and language-specific (Russian) is built. The sentiment lexicon includes the following classes: positive vocabulary, negative vocabulary, polarity modifiers, anti-modifiers, and increments. The REGEX algorithm with formal grammar elements is proposed. 11 formal grammar rules and the corresponding syntactic models are introduced; they are similar to regular expressions which detect certain text elements, simplify each sentence, and present the text as a formal model. The SENTIMENTO system for evaluating bank service quality is implemented as an Internet application with an interface for the model testing and its adjustment. The efficiency of the proposed algorithm is evaluated in comparison with the efficiency of the Naïve Bayes Classifier, F1 measure is used as the criterion. The system is tested on the reviews published in the clients’ bank rating (www.banki.ru) and the advantage of the proposed algorithm is demonstrated. For the same set of reviews, the F1 value is 0.920 when the proposed method is applied, while it is 0.872 for the Naïve Bayes Classifier.


