The Application of Diachronic Corpus Compilation Principles in a Pilot Study of Subjectivity
DOI:
https://doi.org/10.22364/BJELLC.13.2023.04Keywords:
corpus creation, frequency, surplus-deficit index, subjectivity, first-person pronounsAbstract
Researchers claim (see Egbert, 2018) that, irrespective of the growing amount of corpora, there is insufficient focus on the research and discussion of corpus creation and analysis challenges. The ongoing international project LEXECON (2021-2024) raises awareness about these kinds of issues. The goal of this study is twofold: firstly, to explore corpus creation stages in relation to compilation criteria; and secondly, to pilot the functionality of the created subcorpus by researching first-person pronoun variations to uncover the subjectivity across the subcorpus genres. The pronouns were explored by observing their relative frequency, context, and surplus-deficit index. Two corpus analysis tools—Sketch Engine and Hyperbase 10—were applied. The corpus creation results confirm that balance is the most challenging corpus criterion to fulfil, whereas corpus editing is the most time-consuming corpus creation stage. The results obtained via first-person pronoun extraction confirm that the context and surplus-deficit index contribute to the research results no less than the relative frequency data. The analysis of personal pronoun data variations shows that essays contain the fewest first-person singular pronouns; however, in other genres, they often do not convey an authorial stance. Moreover, a greater surplus of possessive case reflects a more active authorial stance as opposed to objective case.
References
Baker, P., Hardie, A. and McEnery, T. (2006) A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press. DOI: https://doi.org/10.1515/9780748626908
Biber, D. (1993) Representativeness in corpus design. Literary and Linguistic Computing, 6 (4): 243-259. DOI: https://doi.org/10.1093/llc/8.4.243
Biber, D., Johansson, S., Leech, G. N., Conrad, S. and Finegan, E. (2021) Grammar of Spoken and Written English. Amsterdam/Philadelphia: John Benjamins Publishing Company. DOI: https://doi.org/10.1075/z.232
Baumgarten, N., Du Bois, I. and House, J. (2012) Introduction. In N. Baumgarten, I. Du Bois and J. House (eds.) Subjectivity in Language and in Discourse (pp. 1-14). Bingley: Emerald Group Publishing Limited. DOI: https://doi.org/10.1163/9789004261921
Bollmann, M. (2019) A large-scale comparison of historical text normalization systems. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers), (pp. 3885-3898). Minneapolis, Minnesota: Association for Computational Linguistics.
Brezina, V. (2018) Statistics for Corpus Linguistics. Cambridge University Press. DOI: https://doi.org/10.1017/9781316410899
Brunet, E. (2011) Hyperbase. Manuel de référence. Available from http://hyperbase.unice.fr/hyperbase/doc/manuel.pdf [Accessed on 7 September 2022].
Bullions, P. (1847) The Principles of English Grammar. New York: Pratt, Woodford & Co. Available from https://catalog.hathitrust.org/Record/100619837 [Accessed on 1 November 2022].
Chadbourne, R. M. (1983) A puzzling literary genre: Comparative views of the essay. Comparative Literature Studies, 20 (2): 133-153. Available from https://www.jstor.org/stable/40246392 [Accessed on 2 October 2021].
Dash, N. S. and Ramamoorthy, L. (2019) Corpus editing and text normalization. In Utility and Application of Language Corpora (pp. 35-56). Singapore: Springer. DOI: https://doi.org/10.1007/978-981-13-1801-6_3
Egbert, J. (2018) Corpus design and representativeness. In T. B. Sardinha and M. V. Pinto (eds.) Multi-Dimensional Analysis: research methods and current issues (pp. 27-42). London: Bloomsbury Academic. DOI: https://doi.org/10.5040/9781350023857.0010
Egbert, J. (2019) Usage-based theories of Construction Grammar: triangulating corpus linguistics and psycholinguistics. In J. Egbert and P. Baker (eds.) Using Corpus Methods to Triangulate Linguistic Analysis. New York: Routledge. DOI: https://doi.org/10.4324/9781315112466
Egbert, J., Larsson, T. and Biber, D. (2020) Doing Linguistics with Corpus. Cambridge: Cambridge University Press. DOI: https://doi.org/10.1017/9781108888790
Fina, A. (2009) Language and subjectivity. Estudios de Lingüística Aplicada, 27 (50): 117-176.
Gablasova, D., Brezina, V. and McEnery, T. (2019) The Trinity Lancaster Corpus Development, description and application. International Journal of Learner Corpus Research, 5 (2): 126-158. DOI: https://doi.org/10.1075/ijlcr.5.2
Gatto, M. (2014) Web as Corpus: theory and practice. London: A&C Black. Available from https://books.google.lv/books?id=J4NnAgAAQBAJ&source=gbs_navlinks_s [Accessed on 23 August 2022].
Guidi, M. E. L., Tusset, G., Guccione, C., Lupetti, M., Carpi, E., Henrot Sostero, G., Musacchio, M. T., Simon, F., Flinz, C., Beghini, F., Bientinesi, F., Caldari, K., Cammalleri, C. M., Migliorelli, M., Morleo, M., Pagliai, L., Pomini, M., Quinci, C., Romeo, M., Rosati, F., Sclafani, M. D., Vaccarelli, F. and Vezzani., F. (2021) LEXECON. The international research network on the economics lexicon. The Economic Teacher: A transnational and diachronic study of treatises and textbooks of economics (18th to 20th century). Project: Translations and the Circulation of Economic Ideas. Available from https://www.shorturl.at/luyT2 [Accessed on 23 August 2022].
Hanks, P. (2012) The corpus revolution in lexicography. International Journal of Lexicography, 25 (4): 398-436. DOI: https://doi.org/10.1093/ijl/ecs026
House, J. (2012) Subjectivity in English lingua franca interactions. In N. Baumgarten, I. Du Bois and J. House (eds.) Subjectivity in Language and in Discourse (pp. 139-156). Bingley: Emerald Group Publishing Limited. DOI: https://doi.org/10.1163/9789004261921_008
Kilgarriff, A. and Grefenstette, G. (2003) Introduction to a special issue on the web as corpus. Computational Linguistics, 29 (3): 333-347. Available from https://aclanthology.org/J03-3001.pdf [Accessed on 3 November 2022]. DOI: https://doi.org/10.1162/089120103322711569
Kilgarriff, A. and Rychly, P. (2008) Finding the Words Which Are Most X. Available from https://www.sketchengine.eu/wp-content/uploads/2015/05/Finding_the_words_2008.pdf [Accessed on 7 September 2022].
Langacker, R. W. (2009) Investigations in Cognitive Grammar. Berlin: Mouton de Gruyter. DOI: https://doi.org/10.1515/9783110214369
Malavska, V. (2016) Genre of an academic lecture. International Journal on Language Literature and Culture in Education, 3 (2): 56-84. Available from https://www.researchgate.net/publication/310815820_Genre_of_an_Academic_Lecture [Accessed on 5 May 2022]. DOI: https://doi.org/10.1515/llce-2016-0010
McEnery, T. and Brooks, G. (2022) Building a written corpus: what are the basics? In A. O’Keeffe and M. J. McCarthy (eds.) The Routledge Handbook of Corpus Linguistics, 2nd ed. (n. p.). New York: Routledge. Available from https://www.taylorfrancis.com/chapters/edit/10.4324/9780367076399-2/building-corpus-key-considerations-randi-reppen [Accessed on 15 September 2022].
McEnery, T. and Hardie, A. (2012) Corpus Linguistics: method, theory and practice. New York: Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511981395
McEnery, T. and Wilson, A. (1996) Corpus Linguistics. Edinburgh: Edinburgh University Press.
McEnery, T. and Xiao, R. (2005) Character encoding in corpus construction. In Developing Linguistic Corpora: a guide to good practice. Available from http://icar.cnrs.fr/ecole_thematique/contaci/documents/Baude/wynne.pdf [Accessed on 1 September 2022].
McEnery, T., Xiao, R. and Tono, R. (2006) Corpus-Based Language Studies: an advanced resource book. New York: Routledge. Available from https://www.jbe-platform.com/content/journals/10.1075/ijcl.11.4.09w [Accessed on 23 August 2022].
Pho, P. D. (2012) Authorial stance in research article abstracts and introductions from two disciplines. In N. Baumgarten, I. Du Bois and J. House (eds.) Subjectivity in Language and in Discourse (pp. 97-114). Bingley: Emerald Group Publishing Limited. DOI: https://doi.org/10.1163/9789004261921_006
Reppen, R. (2022) Building a corpus: what are the key considerations? In A. O’Keeffe and M. J. McCarthy (eds.) The Routledge Handbook of Corpus Linguistics, 2nd ed. (n. p.). New York: Routledge. Available from https://www.taylorfrancis.com/chapters/edit/10.4324/9780367076399-2/building-corpus-key-considerations-randi-reppen [Accessed on 15 September 2022]. DOI: https://doi.org/10.4324/9780367076399-2
Sampson, G. (2013) The empirical trend. International Journal of Corpus Linguistics, 18 (2): 281-289. DOI: https://doi.org/10.1075/ijcl.18.2.05sam
Sinclair, J. (2005) Corpus and text – basic principles. In M. Wynne (ed.) Developing Linguistic Corpora: A Guide to Good Practice (pp. 1-16). Oxford: Oxbow Books. Available from: https://users.ox.ac.uk/~martinw/dlc/index.htm [Accessed on 6 November 2022].
Sinclair, J. (1991) Corpus. Concordance, Collocation. Oxford: Oxford University Press.
Tantucci, V. (2021) Language and Social Minds: the semantics and pragmatics of intersubjectivity. Cambridge University Press. DOI: https://doi.org/10.1017/9781108676441
Vinčela, Z. (2017). Canadian dollar in the English language varieties: corpus based study. Baltic Journal of English Language, Literature and Culture, 7: 161-171. https://doi.org/10.22364/BJELLC.07.2017.10 DOI: https://doi.org/10.22364/BJELLC.07.2017.10
Weisser, M. (2016) Practical Corpus Linguistics. Hoboken, New Jersey: Wiley Blackwell. DOI: https://doi.org/10.1002/9781119180180
Hyperbase 10. Available from http://ancilla.unice.fr/pages/logiciel/ [Accessed on 7 September 2022].
Sketch Engine. Available from https://www.sketchengine.eu/ [Accessed on 7 September 2022].
Downloads
Published
Issue
Section
License
Copyright (c) 2023 University of Latvia
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.