Towards Conversational Diagnostic AI

social-sciences
hci
AI system AMIE outperformed PCPs in diagnostic accuracy and performance according to specialists and patients, but real-world translation requires further research.
Authors

Tao Tu

Anil Palepu

Mike Schaekermann

Khaled Saab

Jan Freyberg

Ryutaro Tanno

Amy Wang

Brenna Li

Mohamed Amin

Nenad Tomasev

Shekoofeh Azizi

Karan Singhal

Yong Cheng

Le Hou

Albert Webson

Kavita Kulkarni

S Sara Mahdavi

Christopher Semturs

Juraj Gottweis

Joelle Barral

Katherine Chou

Greg S Corrado

Yossi Matias

Alan Karthikesalingam

Vivek Natarajan

Published

January 11, 2024

Summary

In the paper “Towards Conversational Diagnostic AI,” the authors introduce AMIE, an AI system optimized for diagnostic dialogue. They compare AMIE’s performance to that of primary care physicians (PCPs) in a study of text-based consultations with simulated patient actors. The study finds that AMIE demonstrated greater diagnostic accuracy and superior performance on several axes according to specialist physicians and patient actors. The evaluation framework encompasses history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. Additionally, the authors detail the datasets used to develop AMIE, including medical reasoning, long-form medical question answering, medical summarization, and real-world dialogue datasets. They describe a simulated learning environment for diagnostic dialogues, a self-play framework for iterative improvement, instruction fine-tuning, and a chain-of-reasoning strategy for online inference.

Major Findings

  1. The AI system, AMIE, showed greater diagnostic accuracy and superior performance in multiple axes compared to primary care physicians in simulated text-based consultations.
  2. AMIE was able to achieve higher conversation quality by surpassing PCPs in patient actor and specialist physician evaluations for various axes, including communication skills and empathy.
  3. The study introduced a novel self-play environment for learning, and a chain-of-reasoning strategy for online inference, which significantly contributed to AMIE’s performance and capabilities.

Methods and Results

AMIE: An LLM based AI System for Diagnostic Dialogue

  • The authors used diverse real-world datasets for training AMIE, including medical reasoning, long-form medical question answering, medical summarization, and real-world dialogue datasets.
  • They developed a simulated dialogue learning environment with a self-play framework for iterative improvement, an instruction fine-tuning process, and a chain-of-reasoning strategy for online inference.

Objective Structured Clinical Examination

  • The study involved 20 PCPs and 20 validated patient actors in a randomized, double-blind crossover study with 149 case scenarios.
  • AMIE’s consultations outperformed PCPs in terms of conversation quality across multiple axes, as assessed by both patient actors and specialist physicians.

Critique

The study has several limitations, including the use of a text-chat interface, which may not be representative of usual clinical consultation settings. The simulated patient actors may not fully reflect the complexity and nuances of real patients, and the study design may not fully capture the challenges of real-world clinical dialogue. Future research should aim to address these limitations and further validate AMIE’s performance in real-world clinical practice.

Overall, the paper contributes to the development of conversational diagnostic AI systems and highlights the potential of AI in improving the quality and accuracy of medical consultations. However, it is important to consider the limitations and contextual factors related to the study design and evaluation. More research is needed to translate AMIE to real-world clinical settings and to validate its performance in diverse healthcare contexts.

Appendix

Model gpt-3.5-turbo-1106
Date Generated 2024-02-26
Abstract http://arxiv.org/abs/2401.05654v1
HTML https://browse.arxiv.org/html/2401.05654v1
Truncated True
Word Count 18673