Tagging Errors in Non-Native English Language Student-Composed Texts of Different Registers

Authors

DOI:

https://doi.org/10.22364/BJELLC.04.2014.10

Keywords:

corpus, annotation, accuracy, tagging error

Abstract

Research of linguistic features requires part of speech (POS) tagging of texts. The existing POS taggers have been predominantly trained on native speakers’ texts to enhance their accuracy. The researchers exploring POS tagging of ELL (English language learners) texts distinguish tagger’s and learners’ errors and suggest annotation enhancement schemes. However, the frequency and types of CLAWS7 (Constituent Likelihood Automatic Word Tagging System) tagging errors in ELL texts of different communicative purposes have not been sufficiently explored to suggest annotation enhancement solutions in each particular learner corpus building case. This study investigates CLAWS7 tagged texts composed by non-native English philology BA students (English Studies Department, University of Latvia) to uncover the overall precision of the tags having the greatest impact on the error rate and provide an insight into errors to reveal the texts requiring annotation enhancement solutions. Material for the analysis has been selected from the corpus of student-composed texts. The results show that tagging precision varies across the text groups. The texts edited by the students show greater tagging precision, and therefore would not require specific annotation enhancement procedures before their tagging. Tagging precision is lower in such interactional texts as chat messages that could be addressed by the application of an annotation enhancement scheme.

References

Aarts, J. and Granger, S. (1998) Tag sequence in learner corpora: a key to interlanguage grammar and discourse. In S. Granger (ed.) Learner English on Computer. Essex: Longman Limited.

Adami, E. (2009) To each reader his, their or her pronoun: Prescribed, proscribed and disregarded uses of generic pronouns in English. In A. Ronouf and A. Kehoe (eds.) Corpus Linguistics: Refinements and Reassessments. Amsterdam, New York: Rodopi. DOI: https://doi.org/10.1163/9789042025981_016

Granger, S. (2003) Error-tagged learner corpora and CALL: A promising synergy. CALICO Journal, 20 (3): 465-480. DOI: https://doi.org/10.1558/cj.v20i3.465-480

Hovermale, D. J. and Martin, S. (2008) Developing an Annotation Scheme for ELL Spelling Errors. Department of Linguistics, The Ohio State University, Columbus, OH. Available from http://www.ling.ohio-state.edu/~scott/publications/Hovermale-Martin-MCLC05-2008.pdf [Accessed on 1 August 2013].

Leech, G. (1997) Introducing corpus annotation. In R. Garside, G. Leech and T. McEnery (eds.) Corpus Annotation Linguistic Information from Computer Text Corpora (pp. 1-18). London: Longman.

Leech, G., Garside, R. and Bryant, M. (1994) CLAWS4: The tagging of the British National Corpus. In Proceedings of the 15th International Conference on Computational Linguistics (COLING 94; pp. 622-628). Kyoto, Japan. DOI: https://doi.org/10.3115/991886.991996

McEnery, A. (2003) Corpus linguistics. In R. Mitkov (ed.) The Oxford Handbook of Computational Linguistics (pp. 448-463). Oxford: Oxford University Press.

McEnery, T., Xiao, R. and Tono, Y. (2006) Corpus Based Language Studies: An Advanced Resource Book. Routledge: London.

Reppen, R. (2010) Building a corpus. In A O’Keeffe and M. McCarthy (eds.) The Routledge Handbook of Corpus Linguistics. London, New York: Routledge.

Twardo, S. (2012) Selected errors in the use of verbs by adult learners of English at B1, B2 and C1 levels. In Input, Process and Product: Developments in Teaching and Language Corpora (pp. 273-282). Brno: Masaryk University Press.

UCREL (2010) Available from http://ucrel.lancs.ac.uk/ [Accessed on August 1, 2013].

Van Rooy, B. and Schäfer, L. (2003) An evaluation of three POS taggers for the tagging of the Tswana learner English corpus. In D. Archer, P. Rayson, A. Wilson and T. McEnery (eds.) Proceedings of the Corpus Linguistics 2003 Conference, 28‑31 March, Vol. 16, University Centre For Computer Corpus Research on Language Technical Papers (pp. 835-844). Lancaster, UK: Lancaster University.

Downloads

Published

2014-04-25

How to Cite

Vinčela, Z. (2014). Tagging Errors in Non-Native English Language Student-Composed Texts of Different Registers. Baltic Journal of English Language, Literature and Culture, 4, 122-129. https://doi.org/10.22364/BJELLC.04.2014.10