1 | David M. Howcroft

What happens if you treat ordinal ratings as interval data? Human evaluations in NLP are even more under-powered than you think

Previous work has shown that human evaluations in NLP are notoriously under-powered. Here, we argue that there are two common factors …

David M. Howcroft, Verena Rieser

OTTers: One-turn Topic Transitions for Open-Domain Dialogue

Mixed initiative in open-domain dialogue requires a system to pro-actively introduce new topics. The one-turn topic transition task …

Karin Sevegnani, David M. Howcroft, Ioannis Konstas, Verena Rieser

Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions

Human assessment remains the most trusted forrm of evaluation in NLG, but highly diverse approaches and a proliferation of different …

David M. Howcroft, Anya Belz, Miruna Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser

Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing

Current standards for designing and reporting human evaluations in NLP mean it is generally unclear which evaluations are comparable …

Anya Belz, Simon Mille, David M. Howcroft

Semantic Noise Matters for Neural Natural Language Generation

Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to …

Ondřej Dušek, David M. Howcroft, Verena Rieser

Toward Bayesian Synchronous Tree Substitution Grammars for Sentence Planning

Developing conventional natural language generation systems requires extensive attention from human experts in order to craft complex …

David M. Howcroft, Dietrich Klakow, Vera Demberg

G-TUNA: A Corpus of Referring Expressions in German, Including Duration Information

Corpora of referring expressions elicited from human participants in a controlled environment are an important resource for research on …

David Howcroft, Jorrig Vogels, Vera Demberg

The Extended SPaRKy Restaurant Corpus: Designing a Corpus with Variable Information Density

Natural language generation (NLG) systems rely on corpora for both hand-crafted approaches in a traditional NLG architecture and for …

David M. Howcroft, Dietrich Klakow, Vera Demberg

Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking

While previous research on readability has typically focused on document-level measures, recent work in areas such as natural language …

David M. Howcroft, Vera Demberg

From OpenCCG to AI Planning: Detecting Infeasible Edges in Sentence Generation

The search space in grammar-based natural language generation tasks can get very large, which is particularly problematic when …

Maximilian Schwenger, Álvaro Torralba, Joerg Hoffmann, David M Howcroft, Vera Demberg