The compression system deletes part of a sentence in order to compress it. This system was developed for Dutch within the NEON project and within the Daeso Project, and is based on earlier research for English in the MUSA project.
The system takes a hybrid rule-based - statistical approach. First each sentence is parsed with the memory-based shallow parser. The parser tokenizes the sentence and asigns part-of-speech tags, IOB- chunk tags and lemmas to every token. The compression system uses the predicted chunk tags to determine which words or phrases are a candidate for removal. The following types of phrases or words are marked as candidates for removal: