Reach me on gmail: dave.howcroft
Find me on: StackOverflow
I research the factors that make a text more or less difficult to read and ways to incorporate this knowledge into natural language generation systems. On the complexity side of things, I am interested in the role played by factors like surprisal, embedding depth, dependency length, and idea density in reading comprehension. In addition to factoring these features into models of generation, I have worked on grammar induction for microplanning and the influence of discourse markers on fluency judgements.
Cynthia A Johnson, Rachel Steindel Burdin, and Rory Turnbull, and I are examining adjectival paradigms in Middle and New High German using expected relative entropy. For an overview of the project, you can check out an old handout.
One of my first papers evaluated the discriminative power of psycholinguistic metrics in ranking sentences according to their complexity. Using the PWKP corpus (Zhu et al. 2010), I trained rankers using both traditional features like word and sentence length and psycholinguistically-motivated features like surprisal and embedding depth. The psycholinguistic features resulted in a small but significant improvement in accuracy.
In 2012 and 2013 I worked with Michael White on the generation of contrastive expressions and presented our work at ENLG.
Unfortunately there's no video, but you should read the paper if you're interested:
David M. Howcroft, Crystal Nakatsu, and Michael White. 2013. "Enhancing the Expression of Contrast in the SPaRKy Restaurant Corpus". In Proceedings of the 14th European Workshop on Natural Language Generation. [PDF]