The Application of Diachronic Corpus Compilation Principles in a Pilot Study of Subjectivity

Authors

DOI:

https://doi.org/10.22364/BJELLC.13.2023.04

Keywords:

corpus creation, frequency, surplus-deficit index, subjectivity, first-person pronouns

Abstract

Researchers claim (see Egbert, 2018) that, irrespective of the growing amount of corpora, there is insufficient focus on the research and discussion of corpus creation and analysis challenges. The ongoing international project LEXECON (2021-2024) raises awareness about these kinds of issues. The goal of this study is twofold: firstly, to explore corpus creation stages in relation to compilation criteria; and secondly, to pilot the functionality of the created subcorpus by researching first-person pronoun variations to uncover the subjectivity across the subcorpus genres. The pronouns were explored by observing their relative frequency, context, and surplus-deficit index. Two corpus analysis tools—Sketch Engine and Hyperbase 10—were applied. The corpus creation results confirm that balance is the most challenging corpus criterion to fulfil, whereas corpus editing is the most time-consuming corpus creation stage. The results obtained via first-person pronoun extraction confirm that the context and surplus-deficit index contribute to the research results no less than the relative frequency data. The analysis of personal pronoun data variations shows that essays contain the fewest first-person singular pronouns; however, in other genres, they often do not convey an authorial stance. Moreover, a greater surplus of possessive case reflects a more active authorial stance as opposed to objective case.

Author Biographies

  • Kristīna Korneliusa, University of Latvia

    Kristīna Korneliusa is currently a Ph.D. student working at the University of Latvia. Her research interests include corpus linguistics, systemic functional linguistics, and stylistics.

  • Zigrīda Vinčela, University of Latvia

    Zigrīda Vinčela (Dr. philol., Assoc. prof. in Applied Linguistics) is currently working at the University of Latvia. Her research interests include corpus-based and corpus-driven studies of written and spoken texts as well as some aspects of phonetics and phonology.

References

Baker, P., Hardie, A. and McEnery, T. (2006) A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press. DOI: https://doi.org/10.1515/9780748626908

Biber, D. (1993) Representativeness in corpus design. Literary and Linguistic Computing, 6 (4): 243-259. DOI: https://doi.org/10.1093/llc/8.4.243

Biber, D., Johansson, S., Leech, G. N., Conrad, S. and Finegan, E. (2021) Grammar of Spoken and Written English. Amsterdam/Philadelphia: John Benjamins Publishing Company. DOI: https://doi.org/10.1075/z.232

Baumgarten, N., Du Bois, I. and House, J. (2012) Introduction. In N. Baumgarten, I. Du Bois and J. House (eds.) Subjectivity in Language and in Discourse (pp. 1-14). Bingley: Emerald Group Publishing Limited. DOI: https://doi.org/10.1163/9789004261921

Bollmann, M. (2019) A large-scale comparison of historical text normalization systems. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers), (pp. 3885-3898). Minneapolis, Minnesota: Association for Computational Linguistics.

Brezina, V. (2018) Statistics for Corpus Linguistics. Cambridge University Press. DOI: https://doi.org/10.1017/9781316410899

Brunet, E. (2011) Hyperbase. Manuel de référence. Available from http://hyperbase.unice.fr/hyperbase/doc/manuel.pdf [Accessed on 7 September 2022].

Bullions, P. (1847) The Principles of English Grammar. New York: Pratt, Woodford & Co. Available from https://catalog.hathitrust.org/Record/100619837 [Accessed on 1 November 2022].

Chadbourne, R. M. (1983) A puzzling literary genre: Comparative views of the essay. Comparative Literature Studies, 20 (2): 133-153. Available from https://www.jstor.org/stable/40246392 [Accessed on 2 October 2021].

Dash, N. S. and Ramamoorthy, L. (2019) Corpus editing and text normalization. In Utility and Application of Language Corpora (pp. 35-56). Singapore: Springer. DOI: https://doi.org/10.1007/978-981-13-1801-6_3

Egbert, J. (2018) Corpus design and representativeness. In T. B. Sardinha and M. V. Pinto (eds.) Multi-Dimensional Analysis: research methods and current issues (pp. 27-42). London: Bloomsbury Academic. DOI: https://doi.org/10.5040/9781350023857.0010

Egbert, J. (2019) Usage-based theories of Construction Grammar: triangulating corpus linguistics and psycholinguistics. In J. Egbert and P. Baker (eds.) Using Corpus Methods to Triangulate Linguistic Analysis. New York: Routledge. DOI: https://doi.org/10.4324/9781315112466

Egbert, J., Larsson, T. and Biber, D. (2020) Doing Linguistics with Corpus. Cambridge: Cambridge University Press. DOI: https://doi.org/10.1017/9781108888790

Fina, A. (2009) Language and subjectivity. Estudios de Lingüística Aplicada, 27 (50): 117-176.

Gablasova, D., Brezina, V. and McEnery, T. (2019) The Trinity Lancaster Corpus Development, description and application. International Journal of Learner Corpus Research, 5 (2): 126-158. DOI: https://doi.org/10.1075/ijlcr.5.2

Gatto, M. (2014) Web as Corpus: theory and practice. London: A&C Black. Available from https://books.google.lv/books?id=J4NnAgAAQBAJ&source=gbs_navlinks_s [Accessed on 23 August 2022].

Guidi, M. E. L., Tusset, G., Guccione, C., Lupetti, M., Carpi, E., Henrot Sostero, G., Musacchio, M. T., Simon, F., Flinz, C., Beghini, F., Bientinesi, F., Caldari, K., Cammalleri, C. M., Migliorelli, M., Morleo, M., Pagliai, L., Pomini, M., Quinci, C., Romeo, M., Rosati, F., Sclafani, M. D., Vaccarelli, F. and Vezzani., F. (2021) LEXECON. The international research network on the economics lexicon. The Economic Teacher: A transnational and diachronic study of treatises and textbooks of economics (18th to 20th century). Project: Translations and the Circulation of Economic Ideas. Available from https://www.shorturl.at/luyT2 [Accessed on 23 August 2022].

Hanks, P. (2012) The corpus revolution in lexicography. International Journal of Lexicography, 25 (4): 398-436. DOI: https://doi.org/10.1093/ijl/ecs026

House, J. (2012) Subjectivity in English lingua franca interactions. In N. Baumgarten, I. Du Bois and J. House (eds.) Subjectivity in Language and in Discourse (pp. 139-156). Bingley: Emerald Group Publishing Limited. DOI: https://doi.org/10.1163/9789004261921_008

Kilgarriff, A. and Grefenstette, G. (2003) Introduction to a special issue on the web as corpus. Computational Linguistics, 29 (3): 333-347. Available from https://aclanthology.org/J03-3001.pdf [Accessed on 3 November 2022]. DOI: https://doi.org/10.1162/089120103322711569

Kilgarriff, A. and Rychly, P. (2008) Finding the Words Which Are Most X. Available from https://www.sketchengine.eu/wp-content/uploads/2015/05/Finding_the_words_2008.pdf [Accessed on 7 September 2022].

Langacker, R. W. (2009) Investigations in Cognitive Grammar. Berlin: Mouton de Gruyter. DOI: https://doi.org/10.1515/9783110214369

Malavska, V. (2016) Genre of an academic lecture. International Journal on Language Literature and Culture in Education, 3 (2): 56-84. Available from https://www.researchgate.net/publication/310815820_Genre_of_an_Academic_Lecture [Accessed on 5 May 2022]. DOI: https://doi.org/10.1515/llce-2016-0010

McEnery, T. and Brooks, G. (2022) Building a written corpus: what are the basics? In A. O’Keeffe and M. J. McCarthy (eds.) The Routledge Handbook of Corpus Linguistics, 2nd ed. (n. p.). New York: Routledge. Available from https://www.taylorfrancis.com/chapters/edit/10.4324/9780367076399-2/building-corpus-key-considerations-randi-reppen [Accessed on 15 September 2022].

McEnery, T. and Hardie, A. (2012) Corpus Linguistics: method, theory and practice. New York: Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511981395

McEnery, T. and Wilson, A. (1996) Corpus Linguistics. Edinburgh: Edinburgh University Press.

McEnery, T. and Xiao, R. (2005) Character encoding in corpus construction. In Developing Linguistic Corpora: a guide to good practice. Available from http://icar.cnrs.fr/ecole_thematique/contaci/documents/Baude/wynne.pdf [Accessed on 1 September 2022].

McEnery, T., Xiao, R. and Tono, R. (2006) Corpus-Based Language Studies: an advanced resource book. New York: Routledge. Available from https://www.jbe-platform.com/content/journals/10.1075/ijcl.11.4.09w [Accessed on 23 August 2022].

Pho, P. D. (2012) Authorial stance in research article abstracts and introductions from two disciplines. In N. Baumgarten, I. Du Bois and J. House (eds.) Subjectivity in Language and in Discourse (pp. 97-114). Bingley: Emerald Group Publishing Limited. DOI: https://doi.org/10.1163/9789004261921_006

Reppen, R. (2022) Building a corpus: what are the key considerations? In A. O’Keeffe and M. J. McCarthy (eds.) The Routledge Handbook of Corpus Linguistics, 2nd ed. (n. p.). New York: Routledge. Available from https://www.taylorfrancis.com/chapters/edit/10.4324/9780367076399-2/building-corpus-key-considerations-randi-reppen [Accessed on 15 September 2022]. DOI: https://doi.org/10.4324/9780367076399-2

Sampson, G. (2013) The empirical trend. International Journal of Corpus Linguistics, 18 (2): 281-289. DOI: https://doi.org/10.1075/ijcl.18.2.05sam

Sinclair, J. (2005) Corpus and text – basic principles. In M. Wynne (ed.) Developing Linguistic Corpora: A Guide to Good Practice (pp. 1-16). Oxford: Oxbow Books. Available from: https://users.ox.ac.uk/~martinw/dlc/index.htm [Accessed on 6 November 2022].

Sinclair, J. (1991) Corpus. Concordance, Collocation. Oxford: Oxford University Press.

Tantucci, V. (2021) Language and Social Minds: the semantics and pragmatics of intersubjectivity. Cambridge University Press. DOI: https://doi.org/10.1017/9781108676441

Vinčela, Z. (2017). Canadian dollar in the English language varieties: corpus based study. Baltic Journal of English Language, Literature and Culture, 7: 161-171. https://doi.org/10.22364/BJELLC.07.2017.10 DOI: https://doi.org/10.22364/BJELLC.07.2017.10

Weisser, M. (2016) Practical Corpus Linguistics. Hoboken, New Jersey: Wiley Blackwell. DOI: https://doi.org/10.1002/9781119180180

Hyperbase 10. Available from http://ancilla.unice.fr/pages/logiciel/ [Accessed on 7 September 2022].

Sketch Engine. Available from https://www.sketchengine.eu/ [Accessed on 7 September 2022].

Downloads

Published

2023-05-15

How to Cite

The Application of Diachronic Corpus Compilation Principles in a Pilot Study of Subjectivity. (2023). Baltic Journal of English Language, Literature and Culture, 13, 48-68. https://doi.org/10.22364/BJELLC.13.2023.04