Séminaire: « Models and Resources for Attention-based Unsupervised Word Segmentation »

Séminaire: « Models and Resources for Attention-based Unsupervised Word Segmentation »

Marcely Zanon Boito will come to LIA on July 2nd at 2PM to deliver us an invited talk on « Models and Resources for Attention-based Unsupervised Word Segmentation »
Please find a short abstract  of the presentation below. It will be held physically in the S6 classroom.
 
 
Short Abstract
Documenting languages helps to prevent the extinction of endangered dialects – many of which are otherwise expected to disappear by the end of the century. When documenting oral languages, for which no written form is available, Unsupervised Word Segmentation from speech is a useful, yet challenging, task. It consists in producing time-stamps for slicing utterances into smaller segments corresponding to words.
In this seminar, I will present our speech processing pipeline, which produces word segmentation in a documentation setting. This setting corresponds to leveraging minimal amounts of data: the unsupervised word segmentation task is tackled using only 4 hours of speech data. To cope with the lack of data, we use an attention-based approach that takes advantage of aligned translations in order to ground the discovered word segments.
 
Short Bio
Marcely Zanon Boito is a Computer Scientist and a PhD student from the University Grenoble Alpes (UGA). While pursuing both her Master’s and Ph.D., she was supervised by Professor Aline Villavicencio (University of Sheffield) and Professor Laurent Besacier (UGA, later Naver Labs Europe). Her research interests include low-resource approaches for natural language processing, with a special interest for speech processing.
Les commentaires sont clos.