Natural Language, Legal Hurdles: Navigating the Complexities in Natural Language Processing Development and Application
DOI:
https://doi.org/10.22364/jull.17.03Keywords:
natural language processing, copyright, artificial intelligenceAbstract
This article delves into the legal challenges faced in developing and deploying Natural Language Processing (NLP) technologies, focusing particularly on the European Union’s legal framework, especially the DSM Directive, the InfoSoc Directive, and the Artificial Intelligence Act. It addresses the legal status and accessibility of language data and the development of NLP technologies under both contractual and exception-based models. The authors acknowledge the partial truth in the saying, “US innovates, China replicates, and the EU regulates”. Although Europe’s AI sector is a global competitor and its strict regulations ensure ethical standards and data protection, these regulations might not necessarily boost competitiveness. Such stringent regulations can introduce complexities that may inhibit innovation relative to regions with more lenient policies.
References
Barthélemy, F., Ghesquière, N., Loozen, N. et al. Natural language processing for public services. European Commission, Directorate-General for Digital Services. Publications Office of the European Union, 2022. Available: https://data.europa.eu/doi/10.2799/304724 [last viewed 14.04.2024].
Bensinger, G. Focus: ChatGPT launches boom in AI-written e-books on Amazon. 21 February 2023. Available: https://www.reuters.com/technology/chatgpt-launches-boom-ai-written-e-books-amazon-2023-02-21/ [last viewed 14.04.2024].
Bloom, J. Could AI ‘trading bots’ transform the world of investing? 1 February 2024. Available: https://www.bbc.com/news/business-68092814 [last viewed 14.04.2024].
BRIA Partners with Getty Images to Transform Visual Content Through Responsible AI. 25 October 2022. Available: https://investors.gettyimages.com/news-releases/news-release-details/bria-partners-getty-images-transform-visual-content-through [last viewed 14.04.2024].
Cotton, D. R. E., Cotton, P. A., & Shipway, J. R. Chatting and cheating: Ensuring academic integrity in the era of ChatGPT. Innovations in Education and Teaching International, 61(2), 2024, pp. 228–239. Available: https://doi.org/10.1080/14703297.2023.2190148 [last viewed 14.04.2024]. DOI: https://doi.org/10.1080/14703297.2023.2190148
Dash, N. S., & Arulmozi, S. History, features, and typology of language corpora. Singapore: Springer, 2018. Available: https://doi.org/10.1007/978-981-10-7458-5 [last viewed 14.04.2024]. DOI: https://doi.org/10.1007/978-981-10-7458-5
Dreyfuss, R. C. Expressive genericity: trademarks as language in the Pepsi generation. Notre Dame Law Review, No. 65, 1989.
Eckart de Castilho, R., Dore, G., Margoni, T., Labropoulou, P. & Gurevych, I. A Legal Perspective on Training Models for Natural Language Processing. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, ELRA, 2018. Available: http://www.lrec-conf.org/proceedings/lrec2018/pdf/1006.pdf [last viewed 14.04.2024].
European Commission. “Licences for Europe“ stakeholder dialogue. 22 December 2017. Available: https://digital-strategy.ec.europa.eu/en/library/licences-europe-stakeholder-dialogue [last viewed 14.04.2024].
Feng, Z. Past and Present of Natural Language Processing. In: Formal Analysis for Natural Language Processing: A Handbook. Singapore: Springer, 2023. Available: https://doi.org/10.1007/978-981-16-5172-4_1 [last viewed 14.04.2024]. DOI: https://doi.org/10.1007/978-981-16-5172-4_1
Foster, D. Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play (Japanese Version). O’Reilly Media Incorporated, 2019.
Goldberg, Y. Features for Textual Data. In: Neural Network Methods for Natural Language Processing. Synthesis Lectures on Human Language Technologies. Springer, Cham., 2017. Available: https://doi.org/10.1007/978-3-031-02165-7_6 [last viewed 14.04.2024]. DOI: https://doi.org/10.1007/978-3-031-02165-7_6
Han, J., Pei, J., & Tong, H. Data mining: Concepts and Techniques. 4th edition. Morgan Kaufmann: 2022.
Heikkila, M. This new data poisoning tool lets artists fight back against generative AI. 2023. Available: https://www.technologyreview.com/2023/10/23/1082189/data-poisoning-artists-fight-generative-ai [last viewed 14.04.2024].
Hugenholtz, P. B., Quintais, J. P. Copyright and Artificial Creation: Does EU Copyright Law Protect AI-Assisted Output? IIC 52, 2021. Available: https://doi.org/10.1007/s40319-021-01115-0 [last viewed 14.04.2024]. DOI: https://doi.org/10.1007/s40319-021-01115-0
Ilin, I. Legal Regime of the Language Resources in the Context of the European Language Technology Development. In: Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2019. Lecture Notes in Computer Science. Vetulani, Z., Paroubek, P., Kubis, M. (eds). Springer, Cham, Vol. 13212, 2022. Available: https://doi.org/10.1007/978-3-031-05328-3_24 [last viewed 14.04.2024]. DOI: https://doi.org/10.1007/978-3-031-05328-3_24
Ilin, I., & Kelli, A. The use of human voice and speech in language technologies: the EU and Russian intellectual property law perspectives. Juridica International, Vol. 28, 2019, pp. 17–27. DOI: https://doi.org/10.12697/JI.2019.28.03
Imamguluyev, R. The Rise of GPT-3: Implications for Natural Language Processing and Beyond. International Journal of Research Publication and Reviews, 4(3), 2023, pp. 4893–4903. Available: https://doi.org/10.55248/gengpi.2023.4.33987 [last viewed 14.04.2024]. DOI: https://doi.org/10.55248/gengpi.2023.4.33987
Kamocki, P., Bond, T., Lindén, K., Margoni, T., Kelli, A., Puksas, A. Mind the Ownership Gap? Copyright in AI-generated Language Data, Linköping University Electronic Press (forthcoming), 2024. DOI: https://doi.org/10.3384/ecp210008
Kamocki, P., Linden, K., Puksas, A., Kelli, A. EU Data Governance Act: Outlining a Potential Role for CLARIN. In: CLARIN Annual Conference 2022, Erjavec, T., Eskevich, M. (eds). Linköping Electronic Conference Proceedings, 2023. Available: https://doi.org/10.3384/ecp198006 [last viewed 14.04.2024]. DOI: https://doi.org/10.3384/ecp198006
Kamocki, P., Hannesschläger, V., Hoorn, E., Kelli, A., Kupietz, M., Lindén, K., Puksas, A. Legal Issues Related to the Use of Twitter Data in Language Research, 2022. In: Selected Papers from the CLARIN Annual Conference 2021. Linköping Electronic Conference Proceedings, Monachini, М., Eskevichm, M. (eds). Available: https://doi.org/10.3384/ecp1897 [last viewed 14.04.2024]. DOI: https://doi.org/10.3384/ecp1897
Kelli, A., Lepik, G. Originality as a Key Concept of the Estonian Copyright Law. In: Handbook on Originality in Copyright, Gupta, I. (eds). Singapore: Springer, 2023. Available: https://doi.org/10.1007/978-981-19-1144-6_10-1 [last viewed 14.04.2024]. DOI: https://doi.org/10.1007/978-981-19-1144-6_10-1
Kelli, A., Mets, T., Jonsson, L., Pisuke, H., Adamsoo, R. The changing approach in Academia-Industry collaboration: From profit orientation to innovation support. Trames Journal of the Humanities and Social Sciences, 17(3), 2013, pp. 215–241. Available: http://doi.org/10.3176/tr.2013.3.02 [last viewed 14.04.2024]. DOI: https://doi.org/10.3176/tr.2013.3.02
Kelli, A., Tavast, A., Lindén, K. Building a Chatbot: Challenges under Copyright and Data Protection Law. In: Contracting and Contract Law in the Age of Artificial Intelligence, Ebers, M., Poncibò, C., Zou, M. (eds). Bloomsbury Publishing, 2022. DOI: https://doi.org/10.5040/9781509950713.ch-007
Kelli, A., Tavast, A., Lindén, K., Vider, K., Birštonas, R., Labropoulou, P., Kull, I., Tavits, G., Värv, A., Stranák, P., Hajic, J. The Impact of Copyright and Personal Data Laws on the Creation and Use of Models for Language Technologies. In: Selected Papers from the CLARIN Annual Conference 2019, Linköping University Electronic Press., 2020. Simov, K., Eskevich, M. (eds). Available: http://doi.org/10.3384/ecp2020172008 [last viewed 14.04.2024]. DOI: https://doi.org/10.3384/ecp2020172008
King, M. R., & ChatGPT. A conversation on artificial intelligence, chatbots, and plagiarism in higher education. Cellular and molecular bioengineering, 16(1), 2023, pp. 1–2. Available: https://doi.org/10.1007/s12195-022-00754-8 [last viewed 14.04.2024]. DOI: https://doi.org/10.1007/s12195-022-00754-8
Lemley, M. A. How Generative AI Turns Copyright Upside Down. 2023, pp. 12–13. Available: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4517702 [last viewed 14.04.2024].
Margoni, T. Saving research: Lawful access to unlawful sources under Art. 3 CDSM Directive? Kluwer Copyright Blog, 2023. Available: https://copyrightblog.kluweriplaw.com/2023/12/22/saving-research-lawful-access-to-unlawful-sources-under-art-3-cdsm-directive/ [last viewed 14.04.2024].
Mayer, C. W., Ludwig, S., & Brandt, S. Prompt text classifications with transformer models! An exemplary introduction to prompt-based learning with large language models. Journal of Research on
Technology in Education, 55(1), 2023, pp. 125–141. Available: https://doi.org/10.1080/15391523.2022.2142872 [last viewed 14.04.2024]. DOI: https://doi.org/10.1080/15391523.2022.2142872
Mishra, S., Khashabi, D., Baral, C., Choi, Y., & Hajishirzi, H. Reframing Instructional Prompts to GPTk’s Language. 2021. Available: https://doi.org/10.48550/arXiv.2109.07830 [last viewed 14.04.2024]. DOI: https://doi.org/10.18653/v1/2022.findings-acl.50
Opinion of Advocate General Szpunar delivered on 25 October 2018, case No. C 469/17, Funke Medien NRW GmbH v. Bundesrepublik Deutschland, para. 1. Available: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A62017CC0469&qid=1669110125285 [last viewed 14.04.2024].
Pagallo, U., & Sciolla, J. C. Anatomy of web data scraping: ethics, standards, and the troubles of the law. European Journal of Privacy Law & Technologies, No. 2, 2023. Available: https://dx.doi.org/10.2139/ssrn.4707651[last viewed 14.04.2024]. DOI: https://doi.org/10.57230/ejplt232PS
Senftleben, M. Generative AI and Author Remuneration. International Review of Intellectual Property and Competition Law, 54, 2023, pp. 1535–1560. Available: https://doi.org/10.1007/s40319-023-01399-4 [last viewed 14.04.2024]. DOI: https://doi.org/10.1007/s40319-023-01399-4
Truyens, M., Van Eecke, P. Legal aspects of text mining. Computer Law & Security Review, 30(2), 2014, pp. 153–170. Available: https://doi.org/10.1016/j.clsr.2014.01.009 [last viewed 14.04.2024]. DOI: https://doi.org/10.1016/j.clsr.2014.01.009
Vincent, J. Getty Images sues AI art generator Stable Diffusion in the US for copyright infringement. 6 February 2023. Available: https://www.theverge.com/2023/2/6/23587393/ai-art-copyright-lawsuit-getty-images-stable-diffusion [last viewed 14.04.2024].
Weizenbaum, J. ELIZA – a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 1966, pp. 36–45. Available: https://doi.org/10.1145/365153.365168 [last viewed 14.04.2024]. DOI: https://doi.org/10.1145/365153.365168
Zhou, M., Duan, N., Liu, S., & Shum, H. Y. Progress in neural NLP: modeling, learning, and reasoning. Engineering, 6(3), 2020, pp. 275–290. Available: https://doi.org/10.1016/j.eng.2019.12.014 [last viewed 14.04.2024]. DOI: https://doi.org/10.1016/j.eng.2019.12.014
Zong, C., Xia, R., & Zhang, J. Text data mining, Vol. 711, Singapore: Springer, 2021. Available: https://doi.org/10.1007/978-981-16-0100-2 [last viewed 14.04.2024]. DOI: https://doi.org/10.1007/978-981-16-0100-2
Agreement on Trade-Related Aspects of Intellectual Property Rights (as amended on 23 January 2017). Available: https://www.wto.org/english/docs_e/legal_e/31bis_trips_01_e.htm [last viewed 14.04.2024].
Berne Convention for the Protection of Literary and Artistic Works, signed at Berne on 9 September 1886 (Berne Convention). Available: https://www.wipo.int/wipolex/en/text/283698 [last viewed 14.04.2024].
Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC. OJ L 130, 17.5.2019, pp. 92–125. Available: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32019L0790&qid=1706810141810 [last viewed 14.04.2024].
Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society. OJ L 167, 22.6.2001, pp. 10–19. Available: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32001L0029&qid=1706810353762 [last viewed 31.01.2024].
Directive (EU) 2019/770 of the European Parliament and of the Council on certain aspects concerning contracts for the supply of digital content and digital services (Digital Content Directive) [2019] OJ L136/1. Available: https://eur-lex.europa.eu/eli/dir/2019/770/oj [last viewed 14.04.2024].
Estonian Copyright Act. Passed 11.11.1992. English translation available: https://www.riigiteataja.ee/en/eli/527122022006/consolide [last viewed 14.04.2024].
European Parliament legislative resolution of 13 March 2024 on the proposal for a regulation of the European Parliament and of the Council on laying down harmonised rules on Artificial Intelligence (Artificial Intelligence Act) and amending certain Union Legislative Acts (COM(2021)0206 – C9-0146/2021 – 2021/0106(COD)). Available: https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138_EN.html [last viewed 14.04.2024].
Regulation (EU) 2022/868 of the European Parliament and of the Council of 30 May 2022 on European data governance and amending Regulation (EU) 2018/1724 (Data Governance Act). OJ L 152, 3.6.2022, pp. 1–44. Available: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32022R0868&qid=1712322025563 [last viewed 14.04.2024].
WIPO Copyright Treaty. Adopted in Geneva on 20 December 1996. Available: https://www.wipo.int/wipolex/en/text/295166#P86_11560 [last viewed 14.04.2024].
Class Action Complaint and Demand for Jury Trial in case No. 3:22-cv-06823, Doe et al. v. GitHub, Inc. et al., United States District Court for the Northern District of California. Available: https://githubcopilotlitigation.com/pdf/06823/1-0-github_complaint.pdf [last viewed 14.04.2024].
Class Action Complaint and Demand for Jury Trial in case No. 1:24-cv-00084, Nicholas Gage v. Microsoft, OpenAI, United States District Court for the Southern District of New York. Available: https://fingfx.thomsonreuters.com/gfx/legaldocs/klvydkdklpg/OPENAI%20COPYRIGHT%20LAWSUIT%20basbanescomplaint.pdf [last viewed 14.04.2024].
Complaint and Demand for Jury Trial in case No. 1:23-cv-11195, the New York Times company v. Microsoft, OpenAI, United States District Court for the Southern District of New York. Available: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf [last viewed 14.04.2024].
Judgment of the Court of Justice of the European Union of 11 June 2020, case No. C 833/18, SI, Brompton Bicycle Ltd v. Chedech/Get2Get. Available: https://curia.europa.eu/juris/document/document.jsf?text=&docid=227305&pageIndex=0&doclang=EN&mode=lst&dir=&occ=first&part=1&cid=1727690 [last viewed 14.04.2024].
Judgment of the Court of Justice of the European Union of 12 September 2019, case No. C 683/17, Cofemel – Sociedade de Vestuário SA v. G-Star Raw CV, para. 29. Available: https://curia.europa.eu/juris/document/document.jsf?text=&docid=217668&pageIndex=0&doclang=EN&mode=lst&dir=&occ=first&part=1&cid=9984025 [last viewed 14.04.2024].
Judgment of the Court of Justice of the European Union of 29 July 2019, case No. C 469/17, Funke Medien NRW GmbH v. Bundesrepublik Deutschland, para. 25. Available: https://curia.europa.eu/juris/document/document.jsf?text=&docid=216545&pageIndex=0&doclang=EN&mode=lst&dir=&occ=first&part=1&cid=10024671 [last viewed 14.04.2024].
Judgment of the Court of Justice of the European Union of 29 July 2019, case No. C 476/17, Pelham GmbH, Moses Pelham, Martin Haas v. Ralf Hütter, Florian Schneider Esleben, para. 71. Available: https://curia.europa.eu/juris/document/document.jsf?text=&docid=216552&pageIndex=0&doclang=en&mode=lst&dir=&occ=first&part=1&cid=547686 [last viewed 14.04.2024].
Judgment of the Court of Justice of the European Union of 16 November 2016, case No. C 301/15, Marc Soulier, Sara Doke v. Premier ministre, Ministre de la Culture et de la Communication, para. 34. Available: https://curia.europa.eu/juris/document/document.jsf?text=&docid=185423&pageIndex=0&doclang=en&mode=lst&dir=&occ=first&part=1&cid=164343 [last viewed 14.04.2024].
Judgment of the Court of Justice of the European Union of 3 September 2014, case No. C 201/13, Johan Deckmyn, et al. v. Helena Vandersteen et al., para. 26. Available: https://curia.europa.eu/juris/document/document.jsf?mode=DOC&pageIndex=0&docid=157281&part=1&doclang=EN&text=&dir=&occ=first&cid=689582 [last viewed 14.04.2024].
Judgment of the Court of Justice of the European Union of 1 March 2012, case No. C 604/10, Football Dataco Ltd, et al. v. Yahoo! UK Ltd, et al. Available: https://curia.europa.eu/juris/document/document.jsf?text=&docid=119904&pageIndex=0&doclang=EN&mode=lst&dir=&occ=first&part=1&cid=2084171 [last viewed 14.04.2024].
Judgment of the Court of Justice of the European Union of 16 July 2009, case No. C 5/08, Infopaq International A/S v. Danske Dagblades Forening, para. 48, 51. Available: https://curia.europa.eu/juris/document/document.jsf?text=&docid=72482&pageIndex=0&doclang=EN&mode=lst&dir=&occ=first&part=1&cid=1892287 [last viewed 14.04.2024].
AI Prompt Marketplace. Terms of service. Available: https://promptrr.io/terms-of-service/ [last viewed 14.04.2024].
Notice of inquiry and request for comments. U.S. Copyright Office, Library of Congress. Federal Register, Vol. 88, No. 167, Wednesday, August 30, 2023. Available: https://www.govinfo.gov/content/pkg/FR-2023-08-30/pdf/2023-18624.pdf [last viewed 14.04.2024].
Notion AI Supplementary Terms. Available: https://www.notion.so/notion/Notion-AI-Supplementary-Terms-fa9034c8b5a04818a6baf3eac2adddbb [last viewed 14.04.2024].
OpenAI. Terms of use. Effective: January 31, 2024. Available: https://openai.com/policies/terms-of-use [last viewed 14.04.2024].
Prompt Submission Guidelines, PromptBase. Available: https://promptbase.com/prompt-guidelines [last viewed 14.04.2024].
PromptBase. Terms of Service. Available: https://promptbase.com/tandcs [last viewed 14.04.2024].
U.S. Copyright Office, Letter 05.09.2023, Re: Second Request for Reconsideration for Refusal to Register Théâtre D’opéra Spatial (SR # 1-11743923581). Available: https://www.copyright.gov/rulings-filings/review-board/docs/Theatre-Dopera-Spatial.pdf [last viewed 14.04.2024].
U.S. Copyright Office, Letter 21.02.2023, Re: Zarya of the Dawn (Registration #VAu001480196). Available: https://www.copyright.gov/docs/zarya-of-the-dawn.pdf [last viewed 14.04.2024].
Downloads
Published
Issue
Section
License
Copyright (c) 2024 University of Latvia
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.