Home Activities PhDs Publications

The city of Antwerp, photograph by Veerle Smedts

Research Projects

Overview of current and (selected) past projects Projects in focus:


A Full CV is available (for archival purposes) [pdf].

Short CV: Walter Daelemans is professor of Computational Linguistics at the University of Antwerp where he directs the CLiPS computational linguistics research group. His research interests are in machine learning of natural language, for example in the development of Memory-Based Language Processing (CUP, 2005); computational psycholinguistics, especially exemplar based alternatives to mental rules as representations explaining language acquisition and processing; computational stylometry, with a focus on authorship attribution and author profiling from text; and language technology applications, for example biomedical information extraction and cybersecurity systems for social networks.

Affiliations and Awards

Recent keynote lectures at workshops and conferences: SLSP 2016 Czech republic; ISMW FRUCT 2016 Saint Petersburg; EVALITA 2016 Italy; IPRA 2015 Belgium; CLEF-PAN 2015 France; ES3LOD 2014 Iceland; CICLing 2013 Greece; TSD 2012 Czech Republic; CLEF-QA 2011 The Netherlands; RANLP 2009 Bulgaria; International Summer School on Embodied Language Games and Construction Grammar 2009 Italy; LSSA/SAALA/SAALT joint conference 2007 South Africa; NODALIDA 2007 Estonia; CoNLL X 2006 USA; EsTAL 2004 Spain.


Overview of current teaching.

Recent tutorial: Memory-Based Models of Language Acquisition and Processing (with Antal van den Bosch), IASCL, Amsterdam, 2014.

Editorial Boards

Digital Scholarship in the Humanities (Editorial Board); CLIN Journal (Editorial Board, Production Editor); Language and Computation corner of Journal of Logic and Computation (Associate Editor); Logic Journal of the IGPL (Editorial Board); member standing reviewer committee Transactions of the Association for Computational Linguistics.

Previous editorial board memberships: Computational Linguistics, Machine Learning Journal, IEEE Transactions on speech and audio processing (associate editor), Journal of Artificial Intelligence Research (associate editor), Journal of Machine Learning Research (Editorial Board), Research on Language and Computation (associate editor) Computational Cognitive Science (managing editor), Linguistica Antverpiensia (Advisory Board), ...

Organization and Scientific Advice

Chair of the European Chapter of the Association for Computational Linguistics (EACL). Check out the EACL 2017 website.

Member organizing committee SIGDAT, ACL's Special Interest Group for linguistic data and corpus-based approaches to NLP.

SIGNLL: ACL's Special Interest Group on Natural Language Learning. Member Advisory board. (co-founder of the SIG and President 1999-2003). Co-founder CoNLL Conference Series with the CoNLL shared tasks.

CLIF (Computational Linguistics in Flanders). Research community on Language Technology and Computational Linguistics. (Co-initiator, member working group).

President of META-TRUST, representing the network of excellence META-NET and the Multilingual Europe Technology Alliance (META).

Chairman STIL (Stichting Toepassing Inductieve Leertechnieken).

Co-founder of Textkernel and of Textgain (spin-off companies).

Software Development

Grafon-D, Chyp, TDTDT

Software developed (1985-1987) for my PhD included LISP / KRS code for Dutch word-level language technology: text to phonetics (GRAFON-D), hyphenation and syllabification (CHYP), morphological analysis and synthesis, an inheritance-based object-oriented lexical knowledge base, and applications in spelling correction and verb inflection tutoring (TDTDT). The approach was largely frame-based (rules and objects, heavy use of multiple inheritance). Runs on Symbolics Lisp Machines, if you can find one.

TiMBL Memory-Based Learning package

I started implementing a LISP version of memory-based language processing (eventually called WAMBL) after I arrived in Tilburg in 1989, combining k-nn and vdm (as in Stanfill and Waltz’s memory-based reasoning) with information gain weighting of features (infogain as found in ID3, Quinlan’s decision tree learning algorithm). Later, I developed with Antal van den Bosch the IGTree algorithm (an oblivious decision tree learner without pruning to approximate memory-based learning in an efficient way). Based on an early reimplementation by Peter Berck in C, Ko van der Sloot developed TiMBL in C++. TiMBL 1.0 was released in 1998. See the TiMBL reference guide for more history and credits.

Mbt Memory-Based Tagging

This is a wrapper around TiMBL for tagging of sequences. It incorporates facilities for defining left and right context features in a flexible way. Originally designed for part of speech tagging.

Memory-Based Shallow Parsing

If you cut up the parsing process into different disambiguation and segmentation tasks, each of them amenable to supervised classification-based learning (e.g., using TiMBL and Mbt), you have the basic idea underlying memory-based shallow parsing. It contains classifiers for (at least) tokenization, part of speech tagging, phrase chunking, and grammatical relation finding, but the idea can be and has been further extended to include PP-attachment, named entity recognition, semantic role labeling, dependency parsing etc. The original MBSP is described in Daelemans et al. 1999 and Buchholz et al. 1999. Since then many different versions have been built by different people.

One for English is MBSP

For Dutch there is Frog (previously known as Tadpole)

Packages that do Everything

I had some minor input on the design of Tom De Smedt's infamous Pattern package.

Last modified: Wed Jan 4 18:17:38 CET 2017