Lexical Diversity Measures’ Review and Classification

Tyumen State University Herald. Humanities Research. Humanitates


2020, Vol. 6. № 1 (21)

Lexical Diversity Measures’ Review and Classification

For citation: Zakharova E. Yu., Savina O. Yu. 2020. “Lexical Diversity Measures’ Review and Classification”. Tyumen State University Herald. Humanities Research. Humanitates, vol. 6, no. 1 (21), pp. 20-34. DOI: 10.21684/2411-197X-2020-6-1-20-34

About the authors:

Elena Yu. Zakharova, Undergraduate Student, Department of German Philology, University of Tyumen; helzakh@mail.ru; ORCID: 0000-0002-6511-600X

Olga Yu. Savina, Cand. Sci. (Philol.), Associate Professor, Department of German Philology, University of Tyumen; o.y.savina@utmn.ru; ORCID: 0000-0002-4777-3188


This paper reviews various lexical diversity (LD) measures and their classification. The authors define the most significant advantages and disadvantages of the measures and investigate the main scopes of LD application. They include measuring LD in the speech of children and people with aphasia, checking progress in learning a foreign language, and investigating different writing styles of certain authors. Results show that the most frequently used measure is the type-token ratio (TTR), which means the ratio of different words (types) to the total number of words (tokens).

The most important problem of TTR and other measures based on TTR is that the more tokens a text has, the less is the TTR value. This has led to the development of other measures; some of them are based on a TTR formula, thus, they do not solve the problem and the calculation result is also affected by the text length. In that case, the texts with different length cannot be compared.

Another group of measures rests upon the TTR formula supplemented by a principle of sample forming. These measures solve the problem of the TTR partially or completely, though they often require some extra instruments. Fortunately, these instruments are available on the Internet and demand no particular knowledge on their working principle or in programming.

Contemporary researchers tend to use independent measures, because texts mostly have different length and the dependent measures cannot give proper results.


  1. Azodi N., Karimi F., Vaezi R. 2014. “Measuring the lexical richness of productive vocabulary in Iranian EFL university students’ writing performance”. Theory and Practice in Language Studies, vol. 4, no. 9, pp. 1837-1849.

  2. Bates E., Bretherton I., Snyder L. 1988. From first Words to Grammar: Individual Differences and Dissociable Mechanisms. Cambridge: Cambridge University Press.

  3. Carroll J. B. 1964. Language and Thought. Englewood Cliffs N.J.: Prentice Hall.

  4. Covington M. A., McFall J. D. 2010. “Cutting the Gordian knot: the moving-average type-token ratio (MATTR)”. Journal of Quantitative Linguistics, vol. 17, no. 2, pp. 94-100.

  5. Coxhead A. 2000. “A new academic word list”. TESOL Quarterly, vol. 34, no. 2, pp. 213-238.

  6. Daller M. 2011. “Guiraud’s index of lexical richness”. UWE Bristol Research Repository. Accessed 12 December 2019. http://eprints.uwe.ac.uk/11902/

  7. Fergadiotis G., Heather H. W., Thomas M. W. 2013. “Measuring lexical diversity in narrative discourse of people with aphasia”. American Journal of Speech-Language Pathology, vol. 22, no. 2, pp. 397-408.

  8. Guiraud P. 1954. Les Charactères Statistiques du Vocabulaire. Essai de méthodologie. Paris: Presses Universitaires de France.

  9. Herdan G. A. 1955. “New derivation and interpretation of Yule’s ‘Characteristic’ K”. Zeitschrift für angewandte Mathematik und Physik, vol. 6, pp. 332-334.

  10. Johansson V. 2008. “Lexical diversity and lexical density in speech and writing: a developmental perspective”. Working Papers, vol. 53, pp. 61-79.

  11. Johnson W. I. 1944. “A program of research”. Psychological Monographs, vol. 56, no. 2, pp. 1-15.

  12. Koizumi R. 2012. “Relationships between text length and lexical diversity measures: can we use short texts of less than 100 tokens?”. Vocabulary Learning and Instruction, vol. 1, no. 1, pp. 60-69.

  13. Laufer B., Nation P. 1995. “Vocabulary size and use: lexical richness in l2 written production”. Applied Linguistics, vol. 16, no. 3, pp. 307-322.

  14. Lieven E. V. M. 1978. “Conversations between mothers and young children: individual differences and their possible implication for the study of child language learning”. In: Waterson N., Snow C. E. (eds.). The Development of Communication. Chichester: Wiley.

  15. Lissón P., Ballier N. 2018. “Investigating lexical progression through lexical diversity metrics in a corpus of french L3”. Discours, vol. 23. Accessed 22 December 2019. https://www.researchgate.net/publication/333723678_Investigating_Lexical_Progression_through_Lexical...

  16. Maas H. D. 1972. “Über den Zusammenhang zwischen Wortschatzumfang und Länge eines Textes”. Zeitschrift für Literaturwissenschaft und Linguistik, vol. 2, no. 8, pp. 73-96.

  17. Malvern D., Richards B., Chipere N., Durán P. 2004. Lexical Diversity and Language Development: Quantification and Assessment. Hampshire, Palgrave Macmillan.

  18. McCarthy P. M., Jarvis S. 2007. “Voc-D: a theoretical and empirical evaluation”. Language Testing, vol. 24, no. 4, pp. 459-488.

  19. McCarthy P. M., Jarvis S. 2010. “MTLD, vocd-D, and HD-D: a validation study of sophisticated approaches to lexical diversity assessment”. Behavior Research Methods, vol. 42, no. 2, pp. 381-392.

  20. McKee G.T, Brian J.R. 2000. “Measuring vocabulary diversity using dedicated software”. Literary and Linguistic Computing, vol. 15, no. 3, pp. 323-337.

  21. Somers H. H. 1996. “Statistical methods in literary analysis”. In: Leeds J. (ed.). The Computer and Literary Style. Kent, OH: Kent State University.

  22. Templin M. 1957. Certain Language Skills in Children: Their Development and Inter-relationships. Minneapolis, MN: University of Minnesota Press.

  23. Torruella J., Capsada R. 2013. “Lexical statistics and tipological structures: a measure of lexical richness”. Social and Behavioral Sciences, vol. 95, pp. 447-454.

  24. Tweedie F. J., Baayen R. H. 1998. “How variable may a constant be? measures of lexical richness in perspective”. Computers and the Humanities, vol. 32, no. 5, pp. 323-352.

  25. Van Hout R., Vermeer A. 2007. Comparing Measures of Lexical Richness. Modelling and Assessing Vocabulary Knowledge. Amsterdam: Benjamins.

  26. Vermeer A. 2000. “Coming to grips with lexical richness in spontaneous speech data”. Language Testing, vol. 17, pp. 65-83.