Approach to modeling of automatic text classification problem (case study of the audience age prediction)

Tyumen State University Herald. Physical and Mathematical Modeling. Oil, Gas, Energy


Release:

Releases Archive. Вестник ТюмГУ. Физико-математические науки. Информатика (№7, 2014)

Title: 
Approach to modeling of automatic text classification problem (case study of the audience age prediction)


About the authors:

Anna V. Glazkova, Assistant, Department of Software, Institute of Mathematics and Computer Sciences, Tyumen State University
Irina G. Zakharova, Cand. Sci. (Phys.-Math.), Professor, Department of Software, School of Computer Science, University of Tyumen, Tyumen, Russia; i.g.zakharova@utmn.ru, https://orcid.org/0000-0002-4211-7675

Abstract:

The article considers the problem of automatic text classification as a case study of the audience age prediction from the text. The paper describes some possible ways to formalize the problem and discusses their advantages and disadvantages. It is proposed an approach to mathematical modeling of the domain, which implies the representation of a category as a set of classification features and their critical values and a text as a set of text features and their values. In such a case, the classification by a feature can be represented as a mapping of the set of texts in the set of permissible values for this feature. In the final part of the paper the possibility of using neural network technology as a tool for computer implementation of classification algorithms is proved and a brief review of the literature on the application of neural networks for automatic text classification is provided. The approach suggested by the authors is implemented using neural network technology in the form of a prototype software system.

References:

1. Thakkar, K., Shrawankar, U. Test Model for Text Categorization and Text Summarization. International Journal on Computer Science and Engineering. 2013. № 3. Pp. 1539-1545.

2. Zhang, M., Zhou, Z. Multi-Label Neural Networks with Applications to Functional Genomics and Text Categorization. IEEE Transactions on Knowledge and Data Engineering. 2006. №18 (10). Pp. 1338-1351.

3. Borisova, N.F., Kochueva, Z.A., Sharonova, N.V., Hajrova, N.F. Modeling of systematization and classification procedures for data objects by identifying comparator. Vestnik Hersonskogo nacional'nogo tehnicheskogo universiteta — Kherson National Technical University Herald. 2012. № 1. Pp . 91–95. (in Russian).

4. Kamenskaja, O.L. Tekst i kommunikacija [Text and communication]. Мoscow, 1990. 78 p.

(in Russian).

5. Ajvazjan, S.A., Buhshtaber, V.M., Enjukov, I.S., Meshalkin, L.D. Prikladnaja statistika: klassifikacija i snizhenie razmernosti [Applied statistics: classification and reduction of dimension]. Moscow, 1989. 607 p. (in Russian).

6. Zaharova, I.G., Pushkarev, A.N. Software for the dynamic integrated expert support system of decision-making in marketing. Vestnik Tjumenskogo gosudarstvennogo universiteta — Tyumen State University Herald. 2012. № 4. Pp. 151–155. (in Russian).

7. Dunaev, V.V. On a model of classification. Nauchno-tehnicheskaja informacija. Ser. 2 — Scientific and technological information. Series 2. 1990. № 3. Pp. 22–27. (in Russian).

8. Jones, M.T. Programmirovanie iskusstvennogo intellekta v prilozhenijah [Artificial intelligence application programming] / Transl. fr. Eng. by A.I. Osipov. Moscow, 2013. 312 p. (in Russian).

9. Ruiz, M., Srinivasan, P. Hierarchical Text Categorization Using Neural Networks. Information Retrieval. 2002. № 5 (1). С. 87-118.

10. Shevelev, O.G., Petrakov, A.V. Text classification using decision trees and neural backpropagation networks. Vestnik Tomskogo gosudarstvennogo universiteta — Tomsk State University Herald. 2006. № 290. Pp. 300–307. (in Russian).

11. Jo, T. NTC (Neural Text Categorizer): Neural Network for Text Categorization. International Journal of Information Studies. 2010. № 2(2). С. 83-96.

12. Ramasundaram, S., Victor, S. Text Categorization by Backpropagation Network. International Journal of Computer Applications. 2010. № 8(6). Pp. 1-5.

13. Koshkin, D.E. Texts clusterization using neural networks and temporal evaluation of the algorithm. Filosofskie problemy informacionnyh tehnologij i kiberprostranstva — Philosophical Problems of Information Technology and Cyberspace. 2012. № 1. Pp. 72–78. (in Russian).

14. Russian National Corpus. 2003-2014. URL: http: ruscorpora.ru. (date accessed: 30. 04.2014). (in Russian).