Using Elo rating as a metric for comparative judgement in educational assessment

Gray, A ORCID: 0000-0002-1150-2052, Rahat, A.A.M, Crick, T, Lindsay, S and Wallace, D (2022) 'Using Elo rating as a metric for comparative judgement in educational assessment.' In: ICEMT '22: Proceedings of the 6th International Conference on Education and Multimedia Technology. Association for Computing Machinery, Guangzhou, China, pp. 272-278. ISBN 9781450396455

Official URL:


Marking and feedback are essential features of teaching and learning, across the overwhelming majority of educational settings and contexts. However, it can take a great deal of time and effort for teachers to mark assessments, and to provide useful feedback to the students. Furthermore, it also creates a significant cognitive load on the assessors, especially in ensuring fairness and equity. Therefore, an alternative approach to marking called comparative judgement (CJ) has been proposed in the educational space. Inspired by the law of comparative judgment (LCJ). The key idea here is that the better submission between a pair will be identified by a suitably qualified or experienced assessor. This pairwise comparison for as many pairs as possible can then be used to rank all submissions. Studies suggest that CJ is highly reliable and accurate while making it quick for the teachers. Alternative studies have questioned this claim suggesting that the process can increase bias in the results as the same submission is shown many times to an assessor for increasing reliability. Additionally, studies have also found that CJ can result in the overall marking process taking longer than a more traditional method of marking as information about many pairs must be collected. There is a clear necessity to investigate the efficacy of alternative rating and ranking systems that do not require extensive data on every pair of submissions, to reduce the temporal and cognitive burden on assessors, and bias from observing the same submission repeatedly. In this paper, we investigate Elo, which has been extensively used in rating players in zero-sum games such as chess – for devising a ranking between submissions in a comparative judgement context. We experimented on a large-scale Twitter dataset on the topic of a recent major UK political event (“Brexit”, the UK's political exit from the European Union) to ask users which tweet they found funnier between a pair selected from ten tweets. Our analysis of the data reveals that the Elo rating is statistically significantly similar to the CJ ranking with a Kendall's tau score of 0.96 and a p-value of . We finish with an informed discussion regarding the potential wider application of this approach to a range of educational contexts.

Item Type: Book Chapter or Section
UN SDGs: Goal 9: Industry, Innovation and Infrastructure
Keywords: assessment, Bradley-Terry Model, comparative judgement, Elo rating, marking, teaching and learning
Subjects: L Education > L Education (General)
T Technology > T Technology (General)
Divisions: Bath School of Design
Identification Number:
Related URLs:
Date Deposited: 29 May 2024 14:17
Last Modified: 29 May 2024 14:22
URI / Page ID:
Request a change to this item or report an issue Request a change to this item or report an issue
Update item (repository staff only) Update item (repository staff only)