Key Term Extraction Based on a Corpus of Oil and Gas Field Development Discourse

Tyumen State University Herald. Humanities Research. Humanitates


Release:

2016, Vol. 2. №3

Title: 
Key Term Extraction Based on a Corpus of Oil and Gas Field Development Discourse


About the author:

Marina A. Kovyazina, Cand. Sci. (Philol.), Associate Professor, Department of English Philology and Translation, Institute of Philology and Journalism, University of Tyumen; makovyazina@mail.ru

Abstract:

The paper presents a research targeted at term extraction based on a text corpus. The author of the research uses the corpus analysis toolkit “AntConc” and the corpus query system “Sketch Engine” to compile the corpus of texts devoted to oil and gas field development processes, stages, and methods, as well as to extract the key terminology of the domain. Several corpus methods are used to identify the terminology inherent in oil and gas field development discourse: analysing word frequency lists, generating a list of key words and terms based on keyness score, and building a distributional thesaurus with the application of the logDice coefficient. As a result of the corpus-based research, the terms synonymous with the key notion “field development” have been grouped, as well as the key domain-specific and general scientific terminology has been extracted.

References:

  1. Andersen G. 2011. “Evaluation of Alternative Association Measures for Extraction of Terminology Based on a Large Norwegian Corpus”. SYNAPS – A Journal of Professional Communication, vol. 26, pp. 62-68.
  2. Jakubíček M., Kilgarriff A., Kovář V., Rychlý P., Suchomel V. 2013. “The TenTen Corpus Family”. 7th International Corpus Linguistics Conference, Lancaster, July. https://www.sketchengine.co.uk/wp-content/uploads/The_TenTen_Corpus_2013.pdf
  3. Kast-Aigner J. 2009. “Terms in Context: A Corpus-Based Analysis of the Terminology of the European Union’s Development Cooperation Policy”. Fachsprache – International Journal of LSP, no. 3-4, pp. 139-152.
  4. Kilgarriff A., Baisa V., Bušta J., Jakubíček M., Kovář V., Michelfeit J., Rychlý P., Suchomel V. 2014. “The Sketch Engine: Ten Years On”. Lexicography ASIALEX, vol. 1, pp. 7-36. http://link.springer.com/article/10.1007/s40607-014-0009-9
  5. Kilgarriff A., Jakubíček M., Kovář V., Rychlý P., Suchomel V. 2014. “Finding Terms in Corpora for Many Languages with the Sketch Engine”. Proceedings of the Demonstrations at the 14th Conference the European Chapter of the Association for Computational Linguistics, Sweden, April, pp. 53–56. https://www.sketchengine.co.uk/wp-content/uploads/Finding_Terms_2014.pdf
  6. Kilgarriff A., Rychlý P., Smrž P., Tugwell D. 2004. “The Sketch Engine”. Proceedings of the XI EURALEX International Congress, Lorient, pp. 105–116. https://www.sketchengine.co.uk/wp-content/uploads/The_Sketch_Engine_2004.pdf.
  7. Kopotev M. V. 2014. Vvedenie v korpusnuyu lingvistiku [Introduction to Corpus Linguistics]. Prague: Animedia Company.
  8. Rychlý P. 2008. “A Lexicographer-Friendly Association Score”. Proceedings of Recent Advances in Slavonic Natural Language Processing, Brno, Masaryk University, pp. 6–9. https://nlp.fi.muni.cz/raslan/2008/papers/13.pdf
  9. Sketch Engine. “Statistics Used in the Sketch Engine”. https://www.sketchengine.co.uk/wp-content/uploads/ske-stat.pdf
  10. Thomas J. 2016. Discovering English with Sketch Engine: A Corpus-Based Approach to Language Exploration. Versatile, 228 p.
  11. Zakharov V. P. 2015. “Korpusno-orientirovannyiy podhod k postroeniyu tezaurusov i ontologiy” [Corpus-Based Approach to Thesaurus and Ontology Construction]. Structural and Applied Linguistics, no. 11, pp. 123-141.
  12. Zakharov V. P. 2015. “Sochetaemost cherez prizmu korpusov” [Set Phrases: a View through Corpora]. Proceedings of the International Conference “Dialog 2015: Computational Linguistics and Intellectual Technologies”, vol. 1, no 14 (21), pp. 667-682. Moscow: RGGU.
  13. Zakharov V. P., Bogdanova S. Yu. 2011. Korpusnaya lingvistika [Corpus Linguistics]. Irkutsk: Irkutsk State Linguistic University.
  14. Zakharov V. P., Khokhlova M. V. 2012. “Avtomaticheskoe izvlechenie terminov iz spetsialnyih tekstov s ispolzovaniem distributivno-statisticheskogo metoda kak instrument sozdaniya tezaurusov” [Automatic Term Extraction and Statistical Analysis in a Special Text Corpus as a Tool for Thesaurus Construction]. Structural and Applied Linguistics, no. 9, pp. 222-233.
  15. Zakharov V. P., Khokhlova M. V. 2014. “Avtomaticheskoe vyiyavlenie terminologicheskih slovosochetaniy” [Automatic Extracting Terminological Phrases]. Structural and Applied Linguistics, no. 10, pp. 182-200.