Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals
Published in ACL Anthology: Findings of EMNLP, 2024
We propose a counterfactual intervention method using Large Language Models (LLMs) to reveal that, in automated essay scoring, while BERT-like models focus on sentence-level features, LLMs align more comprehensively with scoring rubrics by emphasizing conventions, accuracy, language complexity, and organization. The source code and data of this paper is available at GitHub.
Recommended citation: Yupei Wang, Renfen Hu, and Zhe Zhao. 2024. Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 8906–8925, Miami, Florida, USA. Association for Computational Linguistics.
Download Paper | Download Slides