Named Entity Recognition with Character-Level Models

We discuss two named-entity recognition models which use characters and character-grams either exclusively or as an important part of their data representation. The first model is a character-level HMM with minimal context information, and the second model is a maximum-entropy conditional markov model with substantially richer context features. Our best model achieves an overall F of 86.07% on the English test data (92.31% on the development data). This number represents a 25% error reduction over the same model without word-internal (substring) features.

Dan Klein, Joseph Smarr, Huy Nguyen and Christopher D. Manning, Named Entity Recognition with Character-Level Models. In: Proceedings of CoNLL-2003, Edmonton, Canada, 2003, pp. 180-183. [ps] [ps.gz] [pdf] [bibtex]
Last update: June 11, 2003.