Large Language Models as Minecraft Agents

architectures
production
hci
education
TL;DR: Study evaluates LLMs as Minecraft agents, introduces clarification questions, and presents online interaction platform.
Author

Chris Madge, Massimo Poesio

Published

February 13, 2024

Summary:

  • The study examines the use of Large Language Models (LLMs) as Minecraft agents in the builder and architect settings.
  • The authors introduce clarification questions and evaluate the challenges and opportunities for improvement in using LLMs in this context.
  • They also present a platform for online interaction with the agents and evaluate their performance against previous works.

Major Findings:

  1. The study demonstrates that LLMs can act as agents in a Minecraft-like block world task, with the builder agent performing favorably over past bespoke models.
  2. LLMs have an existing built-in capability to ask and answer questions, which is a valuable feature for interactive agents.
  3. The study shows that LLMs, such as GPT-4 and GPT-3.5, perform similarly and outperform the IGLU NLP evaluation baseline, while other LLM models do not perform as well.

Analysis and Critique:

  • The study provides valuable insights into the potential of LLMs as agents in interactive environments, particularly in the context of Minecraft. However, the evaluation of the architect agents shows some challenges, indicating the need for further quantitative evaluation against existing architect models.
  • The study’s focus on the language component of the task is valuable, but it may benefit from further exploration of the task of manipulating an agent to place blocks, as done in previous tasks.
  • The authors acknowledge the need for future work to improve openly available LLM models to close the gap with fine-tuned baselines, suggesting potential areas for further research and development.

Appendix

Model gpt-3.5-turbo-1106
Date Generated 2024-02-26
Abstract https://arxiv.org/abs/2402.08392v1
HTML https://browse.arxiv.org/html/2402.08392v1
Truncated False
Word Count 7371