Automatic Generation of Lexico-Grammatical Tests for Russian as a Foreign Language Using Predictive Language Models

Authors

DOI:

https://doi.org/10.21638/spbu30.2023.212

Abstract

The authors argue that in teaching a foreign language, one of the basic needs of participants of the educational process is a sufficient number of educational data. Among the tasks that contribute to the acquisition of lexical and grammatical units, gap-filling exercises with multiple-choice have become particularly popular. Nowadays, creating unique tasks manually turns out to be laborintensive. Unlike English, the development of algorithms for exercise generation for Russian is not so active, despite the existing need. In this regard, the authors propose a method for automatic generation of tasks of this type for Russian as a foreign language (RFL). The proposed method is based on distributive semantic models like word2vec and allows to create tasks based on authentic texts, it does not depend on the genre and style of the text or the corresponding language level, and can be easily adapted for other languages. To train the word2vec model, a corpus of children’s and educational literature was developed to emulate the language experience of students. In the course of the work, a web application for teachers was also launched. To assess the consistency and relevance of generated tasks, two experiments were conducted. In the first experiment, naive native speakers of the Russian language were interviewed, while in the second, a survey of experts in RFL was carried out. The high degree of correctness of the tasks and the selected distractors is proved by high scores of precision (0.8) and recall (0.91). The experts have also noted the convenience of the web application.

Keywords:

teaching Russian as a foreign language, automatic generation of language exercises, lexical and grammatical exercises, gap-filling, multiple-choice

Downloads

Download data is not yet available.
 

References

ЛИТЕРАТУРА

Амлинская и др. 2020 — Амлинская Ю. Р., Дубинина Н. А., Гельфрейх П. Г., Ильчева И. Ю. Тестирование школьников по русскому языку в СПбГУ: о чем спрашивают преподаватели и родители? 2020. URL: https://testingcenter.spbu.ru/images/webinars/ webinar_6_bi.pdf (дата обращения: 08.02.2023).

Андрюшина и др. 2015 — Андрюшина Н. П., Афанасьева И. Н., Битехтина Г. А., Клобукова Л. П., Яценко И. И. Лексический минимум по русскому языку как иностранному. Второй сертификационный уровень. Общее владение. Андрюшина Н. П. (ред.). 5-е изд. СПб., 2015. URL: ht tps://www.iprbookshop. ru/81260.html (дата обращения: 15.11.2023).

Ахола и др. 2017 — Ахола С., Башарин А. А., Башмакова Н. И. [и др.] Актуальные вопросы языкового тестирования. Павловская И. Ю. (ред.). Вып. 2. СПб.: Изд-во С.-Петерб. гос. ун-та, 2017. 684 c

Балашова, Волынская, Макарычев 2016 — Балашова И. Ю., Волынская К . И., Макарычев П. П. Методы и средства генерации тестовых заданий из текстов на естественном языке. Модели, системы, сети в экономике, технике, природе и обществе. 2016, № 1(17): 195–202.

Балыхина 2004 — Балыхина Т. М. Основы теории тестов и практика тестирования (в аспекте русского языка как иностранного) . М.: Русский язык. Курсы, 2004. 240 с.

Дубинина, Птюшкин 2021 — Дубинина Н. А., Птюшкин Д. В. Уровни тестирования по русскому языку как иностранному в аспекте возрастной специфики школьников. Русистика. 2021, т. 19, № 2: 222–234.

Кручинин, Кузовикин 2022 — Кручинин В. В., Кузовикин В. В. Обзор существующих методов автоматической генерации задач с условиями на естественном языке. Компьютерные инструменты в образовании. 2022, № 1: 85–96.

Лапошина, Лебедева 2021 — Лапошина А. Н., Лебедева М. Ю. Текстометр: онлайн-инструмент определения уровня сложности текста по русскому языку как иностранному. Русистика. 2021, т. 19, № 3: 331–345.

Малафеев 2015 — Малафеев А. Ю. Метод автоматического создания лексико-грамматических упражнений в формате wordbank cloze. Иностранные языки в высшей школе. 2015, № 2 (33): 88–95.

Оборнева 2006 — Оборнева И. В. Автоматизированная оценка сложности учебных текстов на основе статистических параметров. Дис. … канд. пед. наук. М., 2006. 165 с.

Романенко, Аксенко 2017 — Романенко В. В., Аксенко И. О. Обзор технологий автоматизированного создания и публикации тестовых заданий в системах дистанционного обучения. В сб.: Информационные технологии в науке, управлении, социальной сфере и медицине: сборник научных трудов IV Международной научной конференции. Берестнева О. Г., Мицель А. А., Гладкова Т. А. (ред.). Томск: Изд-во ТПУ, 2017, ч. 1: 317–321.

Цзинцзин 2005 — Цзинцзин Л. Система принципов отбора учебных текстов для формирования межкультурной компетенции иностранных студентов-филологов (уровень В2). Вестник Томского государственного педагогического университета. 2005, № 7 (184): 128–133.

Ягунова 2005 — Ягунова Е. В. Эксперимент в психолингвистике: конспекты лекций и методические рекомендации. СПб.: Остров, 2005. 51 с.

Agarwal, Mannem 2011 — Agarwal M., Mannem P. Automatic Gap-fill Question Generation from Text Books. In: Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications. Portland, Oregon: Association for Computational Linguistics, 2011. P. 56–64.

Anwar et al. 2020 — Anwar S., Shelmanov A., Panchenko A., Biemann C. Generating Lexical Representations of Frames using Lexical Substitution. In: Proceedings of the Probability and Meaning Conference (PaM 2020). Gothenburg: Association for Computational Linguistics, 2020. P. 95–103.

Arefyev et al. 2020 — Arefyev N., Sheludko B., Podolskiy A. V., Panchenko A. Always K eep your Target in Mind: Studying Semantics and Improving Performance of Neural
Lexical Substitution. In: Proceedings of the 28th International Conference on Computational Linguistics. Barcelona: International Committee on Computational Linguistics, 2020. P. 1242–1255.

Dmitrieva, Tiedemann 2021 — Dmitrieva A., Tiedemann J. Creating an Aligned Russian Text Simplifi cation Dataset from Language Learner Data. In: Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing. Babych B. [et al.] (eds). Kiyv: Association for Computational Linguistics, 2021. P. 73–79.

Goodier 2018 — Goodier T. (ed.) Collated Repres entative Samples of Descriptors of Language Competences Developed for Young Learners — resource for educators, Vol. 2: Ages 11–15, Education Policy Division, Council of Europe. 2018.

Krashen 1982 — Krashen S. D. Principles and Practice in Second Language Acquisition. Oxford: Pergamon Press, 1982. 202 p.

Kuzmenko, Fenogenova 201 6 — Kuzmenko E., Fenogenova A. Automatic generation of lexical exercises. In: CLLS 2016. Computational Linguistics and Language Science. Proceedings of the Workshop on Computational Linguistics and Language Science. Aachen: CEUR Workshop Proceedings, 2016, vol. 1886. P. 20–27.

Laposhina et al. 2018 — Laposhina А., Veselovskaya Т., Lebedeva M., Kupreshchenko O. Automated Text Readability Assessment for Russian Second Language Learners. In: Сomputational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2018” . 2018. Issue 17 (24). P. 403–413.

Malafeev 2014 — Malafeev A. Automatic Generation of TextBased Open Cloze Exercises. In: Communications in Computer and Information Science, Ignatov D., Khachay M., Panchenko A., Konstantinova N., Yavorsky R. (eds) Analysis of Images, Social Networks and Texts AIST 2014. Springer, 2014, vol. 436. P. 140–151.

Malafeev 2015 — Malafeev A. Exercise Maker: Automatic Language Exercise Generation. In: Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue” (2015) . Moscow: RSUH, 2015. Issue 14 (21). P. 441–452.

Mikolov et al. 2013 — Mikolov T., Chen K., Corrado G. S., Dean J. Effi cient Estimation of Word Represent ations in Vector Space. International Conference on Learning Representations ICLR. 2013. URL: https://dscomp2019.github.io/papers/Mikolov_et_al2013-Effi cient.pdf (дата обращения: 15.11.2023).

Miller 2017 — Miller D. Leveraging BERT for Extractive Text Summarization on Lectures. 2019. In: arXiv. URL: https://arxiv. org/abs/1906.04165 (дата обращения: 08.02.2023).

Perez, Cuadros 2017 — Perez N., Cuadros M. Multilingual CALL Framework for Automatic Language Exercise Generation from Free Text. In: Proceedings of the Soft ware Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia: Association for Computational Linguistics, 2017. P. 45–52.

Pilan 2016 — Pilan I. Detecting Context Dependence in Exercise It em Candidates Selected from Corpora. In: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. San Diego: Association for Computational Linguistics, 2016. P. 151–161.

Pilan, Volodina, Borin 2017 — Pilan I., Volodina E., Borin L. Candidate sentence selection for language learning exercises: from a comprehensive framework to an empirical evaluation. Traitement Automatique des Langues. 2017, vol. 57 (3): 67–91.

Ren, Zhu 2021 — Ren S., Zhu K. Q. Knowledge-Driven Distractor Generation for Cloze-style Multiple Choice Questions. In: The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI21) . Issue 5 (35). 2021. P. 4339–4347.

Richards 2008 — Richards J. Moving Beyond the Plateau From Intermediate to Advanced Levels in Language Learning. New York, 2008. 28 p.

Skehan 1998 — Skehan P. A Cognitive Approach to Language Learning. Hong Kong; Oxford, 1998. 332 p.

Solovyev, Ivanov, Solnyshkina 2018 — Solovyev V., Ivanov V., Solnyshkina M. Assessment of reading difficulty levels in Russian academic texts: Approaches and metrics. Journal of Intelligent and Fuzzy Systems. 2018, vol. 34, issue 5: 3049–3058.

Xu 2009 — Xu Q. Moving beyond the Intermediate EFL Learning Plateau. Asian Social Science. 2009, vol. 5 (2): 66–68.

Published

2024-04-16

How to Cite

Belyi, A. V., Mitrofanova, O. A., & Dubinina N. А. (2024). Automatic Generation of Lexico-Grammatical Tests for Russian as a Foreign Language Using Predictive Language Models. The World of Russian Word, (2), 108–118. https://doi.org/10.21638/spbu30.2023.212

Issue

Section

Methodology