Sentence Summarization in ATraNoS
ATraNoS
(Automatic Transcription and Normalization of Speech) is a four-year
research project sponsored by
IWT,
the Belgian institute for the promotion of innovation in science and
technology in Flanders.
The project's aim is to contribute to the development of better products
for speech transcription and has tv subtitle generation as case study.
The main task of the universities of Antwerp and Leuven in this
project is to develop a sentence summarization system for Dutch.
In the summarization part of the project we work along three tracks.
In the first track we train a machine learner to predict deleted and
replaced phrases based on a parallel corpus of transcriptions and
subtitles.
In the second track we apply handcrafted phrase deletion rules for
generating sentence summaries.
The third track uses statistical relevance information to determine
what phrases can be removed from sentences.
Work in all tracks relies on a good shallow parsing analysis of the
transcribed Dutch sentences.
In ATraNoS we have developed a parallel sentence-aligned corpus of
transcriptions and the associated subtitles from broadcasts of the
19:00
VRT
Flemish tv news.
Currently this corpus contains 436,104 words.
We work closely together with the
MUSA
project which aims at automatic subtitle generation (from English to
English, French and Greek) for programmes that are broadcasted by
BBC World Service.
Demos
- http://www.cnts.ua.ac.be/cgi-bin/atranos
The ATraNoS Dutch summarization demo provides access to the current
version of the two Antwerp sentence summarization systems based on
machine learning and handcrafted rules.
- http://www.cnts.ua.ac.be/cgi-bin/nlsp
The shallow parser for Dutch can be tested separately.
It performs part-of-speech tagging, lemmatization, text chunking,
relation finding and person name recognition.
- http://www.cnts.ua.ac.be/cgi-bin/musa (demo currently unavailable)
The MUSA English Summarization Demo provides access to the current
version of the sentence summarizer for English.
Associated publications
- Erik F. Tjong Kim Sang, Walter Daelemans and Anja Höthker,
Reduction of Dutch Sentences for Automatic Subtitling.
In: Proceedings of CLIN-2003, University of Antwerp,
Antwerp Papers in Linguistics, 111, 2004, pp. 109-123.
[pdf]
- Walter Daelemans, Anja Höthker and Erik Tjong Kim Sang,
Automatic Sentence Simplification for Subtitling in Dutch and English.
In: Proceedings of LREC-2004, Lisbon, Portugal, 2004, pp. 1045-1048.
[pdf]
- Vincent Vandeghinste and Erik Tjong Kim Sang,
Using a Parallel Transcript/Subtitle Corpus for Sentence Compression.
In: Proceedings of LREC-2004, Lisbon, Portugal, 2004, pp. 231-234.
[pdf]
- Vincent Vandeghinste,
Rule-based Non-recursive Chunking for Dutch.
Talk presented at CLIN-2003, Antwerp, Belgium, December 2003.
- Erik F. Tjong Kim Sang,
Summarizing Dutch Sentences for Automatic Subtitling.
Talk presented at CLIN-2003, Antwerp, Belgium, December 2003.
[ps.gz]
[pdf]
- Vincent Vandeghinste,
Aanpassen van de linguïstische modules voor
ondertiteling: Beschrijving van de huidige
zinsreductiesoftware en Evaluatie
(in Dutch), Internal ATraNoS report WP4-19, 5 pages,
October 2003.
- Erik F. Tjong Kim Sang,
Generating Subtitles from Linguistically Annotated Text,
Internal ATraNoS report WP4-12, 19 pages, October 2003.
- Erik F. Tjong Kim Sang,
A Baseline Subtitle Generator,
Internal ATraNoS report WP4-10, 6 pages, April 2003.
- Erik F. Tjong Kim Sang,
Summarizing Sentences for Automatic Subtitling.
Talk presented at CLIN-2002, Groningen, The Netherlands, November 2002.
[ps.gz]
[pdf]
- Erik F. Tjong Kim Sang,
Alignment of Transcribed Text with Subtitles,
Internal ATraNoS report WP4-01, 6 pages, April 2002.
- Vincent Vandeghinste,
From verbatim transcriptions to subtitles: Input / Output
characteristics,
Internal ATraNoS report WP4-08, 16 pages, April 2002.
People
- Erik Tjong Kim Sang,
ATraNoS postdoc, Antwerp.
- Vincent Vandeghinste,
ATraNoS PhD student, Leuven.
- Anja Höthker,
MUSA PhD student, Antwerp.
- Walter Daelemans,
ATraNoS & MUSA project leader, Antwerp.
- Frank Van Eynde,
ATraNoS project leader, Leuven
Last update: September 06, 2006.
erikt@science.uva.nl