Large Language Models As MOOCs Graders

architectures

social-sciences

education

prompt-engineering

Study explores using large language models to replace peer grading in MOOCs, showing promising results.

Author

Shahriar Golchin, Nikhil Garuda, Christopher Impey, Matthew Wenger

Published

February 6, 2024

Summary:

Large Language Models (LLMs) are being explored as a potential replacement for peer grading in Massive Open Online Courses (MOOCs). The study focuses on three distinct courses: Introductory Astronomy, Astrobiology, and the History and Philosophy of Astronomy. The feasibility of leveraging LLMs to replace peer grading is explored using 18 distinct settings. The study reveals that Zero-shot-CoT, when integrated with instructor-provided answers and rubrics, produces grades that are more aligned with those assigned by instructors compared to peer grading. However, grading courses that require imaginative or speculative thinking, such as History and Philosophy of Astronomy, is found to be a challenge for both LLMs and peer grading. The study suggests a promising direction for automating grading systems for MOOCs, especially in subjects with well-defined rubrics.

Major Findings:

Zero-shot-CoT, when integrated with instructor-provided answers and rubrics, produces grades that are more aligned with those assigned by instructors compared to peer grading.
Grading courses that require imaginative or speculative thinking, such as History and Philosophy of Astronomy, is challenging for both LLMs and peer grading.
LLMs show promise for automating grading systems for MOOCs, especially in subjects with well-defined rubrics.

Analysis and Critique:

The study demonstrates the potential of LLMs in automating grading systems for MOOCs, but it also highlights the challenges in grading courses that require imaginative or speculative thinking.
The findings suggest that further research is needed to address the limitations of LLMs in grading assignments that involve creative and abstract thinking.
Methodological issues related to the integration of LLMs in grading systems should be further explored to ensure reliability and validity.

Appendix

Model	gpt-3.5-turbo-1106
Date Generated	2024-02-26
Abstract	https://arxiv.org/abs/2402.03776v1
HTML	https://browse.arxiv.org/html/2402.03776v1
Truncated	False
Word Count	12158