Policy Improvement using Language Feedback Models
Summary:
The article introduces Language Feedback Models (LFMs) that identify desirable behavior for imitation learning in instruction following. LFMs are trained using feedback from Large Language Models (LLMs) on visual trajectories verbalized to language descriptions. The article presents three major findings: 1. LFMs improve task-completion rate over strong behavioral cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). 2. LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens. 3. LFMs generalize to unseen environments, improving task-completion rate by 3.5-12.0% through one round of adaptation. Additionally, LFMs can provide human-interpretable feedback without performance loss, allowing human verification of desirable behavior for imitation learning.
Major Findings:
- LFMs improve task-completion rate over strong behavioral cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld).
- LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens.
- LFMs generalize to unseen environments, improving task-completion rate by 3.5-12.0% through one round of adaptation.
Analysis and Critique:
- The article does not address potential biases in the LLM feedback that could influence the training of LFMs.
- The comparison to Dagger shows that LFMs outperform using LLMs as an expert for imitation learning, but it would be beneficial to further investigate the reasons for this performance difference.
- The article does not discuss the potential ethical implications of using LFMs for policy improvement, especially in real-world applications. Further exploration of the broader impact of LFMs is necessary.
Appendix
Model | gpt-3.5-turbo-1106 |
Date Generated | 2024-02-26 |
Abstract | https://arxiv.org/abs/2402.07876v1 |
HTML | https://browse.arxiv.org/html/2402.07876v1 |
Truncated | False |
Word Count | 8834 |