Previous abstract | Contents | Next abstract

Memory-based one-step named-entity recognition: Effects of seed list features, classifier stacking, and unannotated data

We present a memory-based named-entity recognition system that chunks and labels named entities in a one-shot task. Training and testing on CoNLL-2003 shared task data, we measure the effects of three extensions. First, we incorporate features that signal the presence of wordforms in external, language-specific seed (gazetteer) lists. Second, we build a second-stage stacked classifier that corrects first-stage output errors. Third, we add selected instances from classified unannotated data to the training material. The system that incorporates all attains an overall F-rate on the final test set of 78.20 on English and 63.02 on German.


Iris Hendrickx and Antal van den Bosch, Memory-based one-step named-entity recognition: Effects of seed list features, classifier stacking, and unannotated data. In: Proceedings of CoNLL-2003, Edmonton, Canada, 2003, pp. 176-179. [ps] [ps.gz] [pdf] [bibtex]
Last update: June 11, 2003. erikt@uia.ua.ac.be