Project information

The well-known fact that similar information can be expressed in many different ways is one of the major challenges in building robust NLP applications. It is commonly assumed that such applications can be improved with knowledge of how natural language expressions relate to each other, for instance in terms of paraphrases (same semantic content, different wording) or entailments (one expression implied by the other). DAESO investigates the detection of semantic overlap between Dutch sentences and the exploitation of this knowledge in a range of NLP applications. For this purpose, tools will be developed for the automatic alignment and classification of semantic relations (between words, phrases and sentences) for Dutch, as well as for a Dutch text-to-text generation application which fuses related sentences into a single grammatical sentence, which may be a generalization, a specification or a reformulation of the input sentences. To facilitate development and testing of these tools, an annotated monolingual Dutch parallel/comparable corpus of 1M words will be developed, consisting of pairs of texts that express comparable information. The utility of the resources and tools will be demonstrated in the context of three applications: (1) question-answering systems (improved recall, more complete answers), (2) information extraction (improved recall), and (3) summarization (beyond extraction: sentence compression, sentence fusion, anaphora resolution).

Abstract Dutch: 

Dezelfde informatie kan in taal op veel verschillende manieren weergegeven worden. Kennis over parafrasering (zelfde semantische inhoud, verschillende verwoording) en "entailment" (de ene expressie impliceert de andere) kan dit probleem tot op zekere hoogte oplossen. In DAESO worden technieken ontwikkeld die toelaten om dergelijke semantische relaties tussen tekst automatisch vast te stellen. De bruikbaarheid van de aanpak zal onderzocht worden in de context van enkele toepassingen: "question answering", informatie-extractie en automatische samenvatting van tekst.

Project Leader(s): 
Walter Daelemans
External Collaborator(s): 

Erwin Marsi (UvT)

Emiel Krahmer (UvT)

01/06/2006 - 31/05/2009

Nederlandse Taalunie (STEVIN)

Syndicate content