Large Language Models are Geographically Biased

social-sciences
education
LLMs carry biases from training data, leading to geographic biases and systemic errors.
Author

Rohin Manvi, Samar Khanna, Marshall Burke, David Lobell, Stefano Ermon

Published

February 5, 2024

Summary:

  • Large Language Models (LLMs) inherently carry the biases contained in their training corpora, which can lead to the perpetuation of societal harm.
  • The study proposes to understand what LLMs know about the world through the lens of geography, particularly focusing on geospatial predictions.
  • The study demonstrates that LLMs are capable of making accurate zero-shot geospatial predictions and exhibit common biases across a range of objective and subjective topics.

Major Findings:

  1. LLMs are capable of making very accurate zero-shot geospatial predictions, showing strong monotonic correlation with ground truth.
  2. LLMs exhibit geographic biases across a range of both objective and subjective topics, particularly biased against areas with lower socioeconomic conditions.
  3. All LLMs are likely biased to some degree, with significant variation in the magnitude of bias across existing LLMs.

Analysis and Critique:

  • The study provides valuable insights into the biases present in LLMs, particularly in the context of geospatial predictions.
  • The findings highlight the need for further research and development to mitigate biases in LLMs, especially in sensitive subjective topics.
  • The study’s focus on geographic bias adds a new dimension to the understanding of biases in LLMs, contributing to the broader conversation on fairness and accuracy in language models.

Appendix

Model gpt-3.5-turbo-1106
Date Generated 2024-02-26
Abstract https://arxiv.org/abs/2402.02680v1
HTML https://browse.arxiv.org/html/2402.02680v1
Truncated False
Word Count 15193