Personae Corpus
- for Author and Personality Prediction -
The PERSONAE corpus was collected for experiments in Authorship Attribution and Personality Prediction. It consists of 145 Dutch-language essays, written by 145 different students (BA in Linguistics and Literature at the University of Antwerp, Belgium). Each student also took an online MBTI personality test, allowing personality prediction experiments. The corpus was controlled for topic, register, genre, age, and education level.
We make available the original texts, a syntactically annotated version of the texts, and the metadata.
More information about the corpus can be found in the README file with the corpus, and in the following publication. Please refer to this paper when publishing work based on this corpus.
- Kim Luyckx and Walter Daelemans (2008). Personae, a Corpus for Author and Personality Prediction from Text. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco. [pdf] [bib]
[Acknowledgements]
The construction of the corpus was made possible by a grant from the Flemish Research Foundation (FWO) to Walter Daelemans. For more information on the 'Computational Techniques for Stylometry for Dutch' project, see this page.
The corpus may be freely used for non-commercial research purposes. For other uses, please contact
Walter Daelemans
CLiPS
University of Antwerp
Prinsstraat 13
2000 Antwerpen
Belgium
walter.daelemans@ua.ac.be
http://www.clips.ua.ac.be/~walter