The Pitfalls of Defining Hallucination

social-sciences
robustness
NLG evaluation lacks clarity, proposes logic-based synthesis of hallucination and omission classifications.
Author

Kees van Deemter

Published

January 15, 2024

Summary:

The article discusses the challenges in evaluating the veracity of computer-generated text, focusing on the issues of hallucination and omission in Natural Language Generation (NLG) and Large Language Models (LLMs). The author examines existing analyses of veracity and proposes a logic-based synthesis to address the limitations of current thinking about hallucination. The article also discusses the implications for more complicated tasks addressed by LLMs.

Major Findings:

  1. Evaluation of Veracity: The quality of computer-generated text is of paramount concern, and assessing the veracity of the text is crucial. The article highlights the importance of evaluating whether the text “speaks the truth, the whole truth, and nothing but the truth.”
  2. Existing Analyses of Veracity: The article discusses various attempts to analyze the problems of hallucination and omission in computer-generated texts, focusing on Data-text NLG. It compares and contrasts different analyses and highlights the limitations and disagreements among them.
  3. Synthesis of Existing Analyses: The author proposes a synthesis of existing analyses to obtain a systematic perspective applicable to domains of all kinds, addressing the limitations of current analyses and offering a new perspective on evaluating veracity.

Analysis and Critique:

The article provides a comprehensive analysis of the challenges in evaluating the veracity of computer-generated text, particularly focusing on the issues of hallucination and omission. However, it also highlights the limitations of existing analyses and the need for further research to address the complexities of evaluating veracity in different NLG tasks, especially those involving LLMs. The article emphasizes the importance of collaboration between the NLP community and logicians to address these challenges and calls for more attention to be paid to the “difficult” aspects of communication, such as ambiguity and vagueness. Overall, the article provides valuable insights into the complexities of evaluating the veracity of computer-generated text and the need for a more nuanced approach to address these challenges.

Appendix

Model gpt-3.5-turbo-1106
Date Generated 2024-02-26
Abstract https://arxiv.org/abs/2401.07897v1
HTML https://browse.arxiv.org/html/2401.07897v1
Truncated False
Word Count 8110