Assessing complexity of Russian legal texts: The model’s archtecture

Authors

  • Olga V. Blinova St. Petersburg State University; HSE University

DOI:

https://doi.org/10.24412/1811-1629-2022-2-4-13

Abstract

The paper describes the metrics-based model for assessing complexity of Russian legal texts. The architecture of the model implies the use of 130 metrics divided into following categories: “basic metrics”, “readability formulas”, “words of different part-of-speech classes”, “n-grams of part-of-speech tags”, “frequency of lemmas”, “word-building patterns”, “grammes”, “lexical and semantic features, multi-word expressions”, “syntactic features”, “cohesion assessments”. Two metrics take into account hypertext links and the presence of vague contexts. Th e model is able to evaluate structural, conceptual, and hypertextual complexity, including both non-specific metrics traditionally used to predict complexity and style specific metrics developed taking into account the peculiarities of official texts. When evaluating morphological and syntactic features, the model refers to the markup layers performed by UDPipe (“rusyntagrus”) and pymorphy2. To make the model work a number of user dictionaries are involved, including a list of lexical means of text deixis, a list of graphic abbreviations (1,500 units), a list of acronyms (2,000 units), a list of legal terms (10,000 units), a list of abstract lemmas (17,000 units), a list of lexical indicators of deontic possibility and necessity, a list of light verb constructions. The values of complexity metrics were calculated for all documents of the CorCodex law corpus, the CorDec corpus of Constitutional court decisions, and the CorRIDA corpus of local acts (about 8 million tokens in total). Annotated legal corpora, complexity metrics, and user dictionaries are available for downloading from plaindocument.org.

Keywords:

Russian legal texts, complexity assessment model, linguistic metrics, readability

Downloads

Download data is not yet available.
 

Published

2022-06-01

How to Cite

Blinova, O. V. (2022). Assessing complexity of Russian legal texts: The model’s archtecture. The World of Russian Word, (2), 4–13. https://doi.org/10.24412/1811-1629-2022-2-4-13

Issue

Section

Linguistics