Release:
2020, Vol. 6. № 3 (23)About the authors:
Anastasiia Yu. Zinoveva, Postgraduate Student, Department of Linguistics and Translation Studies, South Ural State University (Chelyabinsk); zinovevaaiu@bk.ru; ORCID: 0000-0002-7658-7376Abstract:
Properly annotated text corpora are an essential condition in constructing effective and efficient tools for natural language processing (NLP), which provide an operational solution to both theoretical and applied linguistic and informational problems. One of the main and the most complex problems of corpus annotation is resolving tag ambiguities on a specific level of annotation (morphological, syntactic, semantic, etc.).
This paper addresses the issue of ambiguity that emerges on the conceptual level, which is the most relevant text annotation level for solving informational tasks. Conceptual annotation is a special type of semantic annotation usually applied to domain corpora to address specific informational problems such as automatic classification, content and trend analyses, machine learning, machine translation, etc.
In conceptual annotation, text corpora are annotated with tags reflecting the content of a certain domain, which leads to a type of ambiguity that is different from general semantic ambiguity. It has both universal and language- and domain-specific peculiarities. This paper investigates conceptual ambiguity in a case study of a Russian-language corpus on terror attacks.
The research methodology combines automated and manual steps, comprising a) statistical and qualitative corpus analysis, b) the use of pre-developed annotation resources (a terrorism domain ontology, a Russian ontolexicon and a computer platform for conceptual annotation), c) ontological-analysis-based conceptual annotation of the corpus chosen for the case study, d) corpus-based detection and investigation of conceptual ambiguity causes, e) development and experimental study of possible disambiguation methods for some types of conceptual ambiguity.
The findings obtained in this study are specific for Russian-language terrorism domain texts, but the conceptual annotation technique and approaches to conceptual disambiguation developed are applicable to other domains and languages.
Keywords:
References:
Iordanskaya L. N. 1967. Automatic Syntactic Analysis. Vol. 1. Novosibirsk: Nauka. 231 pp. [In Russian]
Polyakov V. N. 2004. “Using lexical meaning-oriented technologies in search and classification tasks”. Problemy prikladnoy lingvistiki. Sbornik statey, no. 2, pp. 101-117. [In Russian]
Rakhilina E. V., Kobritsov B. P., Kustova G. I., Lyashevskaya O. N., Shemanayeva O. J. 2006. “Semantic Ambiguity as an Application-Oriented Problem: Word Class Tagging in the RNC”. Computational Linguistics and Intellectual Technologies. Proceedings of the International Workshop Dialogue 2006 (Moscow), pp. 445-450 [In Russian]
Federal Security Service of the Russian Federation. 2019. The Combined Federal List of Organizations, including Foreign and International Organizations, Recognized as Terrorist in accordance with the Law of the Russian Federation. Accessed 22 September 2020. http://www.fsb.ru/fsb/npd/terror.htm [In Russian]
DeAngelo T. I., Yegiyan N. S. 2009. “Looking for efficiency: how online news structure and emotional tone influence processing time and memory”. Journalism and Mass Communication Quarterly, no. 96 (2), pp. 385-405.
Djemaa M., Candito M., Muller Ph., Vieu L. 2016. “Corpus annotation within the french framenet: a domain-by-domain methodology”. Proceedings of the 10th International Conference on Language Resources and Evaluation, pp. 3794-3801.
Edmundson H. P. 1969. “New methods in automatic extracting”. Journal of the Association for Computing Machinery, no. 16 (2), pp. 264-285.
European Council. 2019. Council Decision (CFSP) 2019/1341 of 8 August 2019. Accessed 22 September 2020. https://eur-lex.europa.eu/legal-content/en/TXT/HTML/?uri=CELEX:32019D1341&from=en
Guarino N. 2012. Introduction to Applied Ontology and Ontological Analysis. Accessed 22 September 2020. https://iaoa.org/isc2012/docs/AppliedOntology_OntologicalAnalysis.pdf
Kim J. D., Ohta T., Tsujii L. 2008. “Corpus annotation for mining biomedical events from literature”. BMC Bioinformatics, no. 9, pp. 9-10.
Nirenburg S., Raskin V. 2004. Ontological Semantics. Cambridge: MIT Press. 440 pp.
Palmer M., Gildea P., Kingsbury P. 2005. “The proposition bank: an annotated corpus of semantic roles”. Computational Linguistics, no. 31 (1), pp. 71-106.
Sheremetyeva S., Zinovyeva A. 2018. “On modelling domain ontology knowledge for processing multilingual texts of terroristic content”. Communications in Computer and Information Science, no. 859, pp. 368-379.
Sheremetyeva S., Zinoveva A. 2019. “Ontological analysis of e-news: a case for terrorism domain”. Proceedings of the 14th International Conference on Interactive Systems: Problems of Human-Computer Interaction, pp. 130-141.
Sheremetyeva S. 2020. “Towards creating interoperable resources for conceptual annotation of multilingual domain corpora”. Proceedings of the 16th Joint ACL — ISO Workshop on Interoperable Semantic Annotation (ISA-16), pp. 102-109.
Viju J. S. 2018. “Concept interpretation by semantic knowledge harvesting”. International Journal for Research in Applied Science and Engineering Technology (IJRASET), no. 6 (5), pp. 477-484.
Wu H., He. J, Pei Y. 2010. “Scientific impact at the topic level: a case study in computational linguistics”. Journal of the American Society for Information Science and Technology, vol. 61, no. 11, pp. 2274-2287.
Zagorulko M. J., Kononenko I. S., Sidorova E. A. 2012. “System for semantic annotation of domain-specific text corpora”. Proceedings of the Annual International Conference “Dialogue”, no. 11 (1), pp. 674-685.