Semantic-syntactic classification of Croatian verbs

Funded: Croatian Science Foundation

Project number: IP-2022-10-8074

The project is conducted at: Institute for the Croatian Language

Duration: from 31 December 2023 to 30 December 2027

Research Team:

Mia Batinić Angster, Department of Linguistics, University of Zadar

Matea Birtić, Institute for the Croatian Language

Branimir Belaj, Faculty of Humanities and Social Sciences, University of Osijek

Ivana Brač, Institute for the Croatian Language, head of the project

Daria Lazić, Institute for the Croatian Language

Maja Matijević, Institute for the Croatian Language

Iva Nazalević Čučević, Faculty of Humanities and Social Sciences, University of Zagreb

Ana Ostroški Anić, Institute for the Croatian Language

Kristina Štrkalj Despot, Institute for the Croatian Language

Siniša Runjaić, Institute for the Croatian Language

Project summary:

The project’s goals are to determine the semantic classes to which the 500 most frequent verbs in the Croatian language belong, along with their prototype syntactic patterns and semantic roles. The project aims to theoretically explore the relationship between the semantics and syntax of Croatian verbs within a semantic class and across classes. An equally important result is a database that will contain a detailed description of verbs’ argument structure. Each verb sense will be associated with a semantic class, and based on examples from the corpora, the type of complement and semantic role will be determined. The project will investigate the semantic features of verbs, such as synonymy, antonymy, troponymy, etc. Additionally, it will explore the role of verbs in cognitive mechanisms such as metaphor and metonymy, as well as syntactic alternations and other grammatical phenomena.

By categorizing verbs into semantic classes and providing valency descriptions, similarities and differences among verbs within a semantic class and between different classes will be observed. That will lead to a better understanding of the relationship between syntax and semantics. The structured and clear presentation of results in the database and publications will facilitate the integration of the Croatian language into comparative research and potential data exchange with other similar databases.

The project is crucial for natural language processing due to its reliable and thorough description of syntactic patterns, semantic roles, and differentiated verbs’ senses. The results can be used for parsing, automatic text annotation, semantic role labeling, improving machine translation tools, creating language learning materials, etc. The outcomes will benefit Croatian language professors, students of all linguistic disciplines, Croatian and foreign linguists, and non-native speakers.

The database will continue to be updated even after the project's completion, with the goal of becoming the most comprehensive database of verbs in the Croatian language.