LLM Responses to Social Surveys

Assessing the representativeness of LLM-generated item responses using latent class analysis

The use of Large Language Models (LLMs) in psychometric research provides a potentially powerful tool for researchers and item developers. Previous work has looked at how LLMs respond to items and the potential of simulated respondents for psychometric research (Liu et al., 2024) and survey research (Jansen et al., 2023). Despite being trained on data written by humans, some researchers have found deficiencies in LLMs’ ability to reproduce human-like response behaviors s (Petrov et al., 2024; Tjuatja et al., 2024). Still, some researchers advocate for using LLMs to simulate human behavior and item responses, a technique referred to as “silicon sampling” (Kim & Lee, 2024; Kozlowski et al., 2024). Using the General Social Survey (GSS), we seek to answer two research questions: (1) how well do LLMs reproduce observed response distributions for social survey items, and (2) do LLM-generated responses reproduce latent classes and demographic profiles from observed responses? By addressing these questions, we aim to provide researchers with expectations about LLM response generation performance and methods by which they can assess LLM responses.

Materials: