David M. Howcroft
David M. Howcroft
Home
Posts
Publications
Talks
Contact
Light
Dark
Automatic
reproducibility
Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions
Human assessment remains the most trusted forrm of evaluation in NLG, but highly diverse approaches and a proliferation of different …
David M. Howcroft
,
Anya Belz
,
Miruna Clinciu
,
Dimitra Gkatzia
,
Sadid A. Hasan
,
Saad Mahamood
,
Simon Mille
,
Emiel van Miltenburg
,
Sashank Santhanam
,
Verena Rieser
PDF
Dataset
Slides
ACL Anthology
Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing
Current standards for designing and reporting human evaluations in NLP mean it is generally unclear which evaluations are comparable …
Anya Belz
,
Simon Mille
,
David M. Howcroft
PDF
ACL Anthology
Cite
×