A data repository for the management of dynamic linguistic datasets

Thomas Gaillat, Leonardo Contreras Roa, Juvénal Attoumbré

Abstract

This paper addresses the issue of using Nakala, a dynamic database technology, for the management of language corpora. We present our ongoing attempt at storing and classifying multimedia documents of a corpus of language learner oral and written productions with universal resource identifiers. The architecture supports query APIs compatible with R packages and other tools which will facilitate the generation of linguistically enriched datasets for a more effective corpus-based study of language acquisition.

Date

Sep 27, 2021 — Sep 29, 2021

Event

CLARIN Annual Conference 2021

data repository database management language resource learner corpus

A data repository for the management of dynamic linguistic datasets

Abstract

Leonardo Contreras Roa

Associate Professor in English Phonetics and Phonology

Related