Sentence Summarization in ATraNoS

ATraNoS (Automatic Transcription and Normalization of Speech) is a four-year research project sponsored by IWT, the Belgian institute for the promotion of innovation in science and technology in Flanders. The project's aim is to contribute to the development of better products for speech transcription and has tv subtitle generation as case study. The main task of the universities of Antwerp and Leuven in this project is to develop a sentence summarization system for Dutch.

In the summarization part of the project we work along three tracks. In the first track we train a machine learner to predict deleted and replaced phrases based on a parallel corpus of transcriptions and subtitles. In the second track we apply handcrafted phrase deletion rules for generating sentence summaries. The third track uses statistical relevance information to determine what phrases can be removed from sentences. Work in all tracks relies on a good shallow parsing analysis of the transcribed Dutch sentences.

In ATraNoS we have developed a parallel sentence-aligned corpus of transcriptions and the associated subtitles from broadcasts of the 19:00 VRT Flemish tv news. Currently this corpus contains 436,104 words. We work closely together with the MUSA project which aims at automatic subtitle generation (from English to English, French and Greek) for programmes that are broadcasted by BBC World Service.

Demos

Associated publications

People


Last update: September 06, 2006. erikt@science.uva.nl