Data Sets and Codebases
- Data sets and code from my thesis (also used in our IJCAI 2005, ACL 2007, and NIPS 2008 papers) .
- Codebase for the LEX algorithm (from our IJCAI 2007 paper, Locating Complex Named Entities in Web Text)
- HMM-LM: an MPI-based parallel HMM package for language modeling (from our NAACL 2010 paper, Improved Extraction Assessment through Better Language Models)
- The Atlasify 240 semantic relatedness data set (from our SIGIR 2012 paper, Explanatory Semantic Relatedness and Explicit Spatialization for Exploratory Search.)