Sotopia

Research.

Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

1Carnegie Mellon University,2Massachusetts Institute of Technology,3Allen Institute for AI

Recent advancements in large language models (LLM) facilitate nuanced social simulations, yet most studies adopt an omniscient perspective, diverging from real human interactions characterized by information asymmetry. We devise an evaluation framework to explore these differences.

Cover Image for agents vs script writer

Abstract

Recent advances in large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena with LLM-based agents. However, most work has used an omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that humans have. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that interlocutors simulated omnisciently are much more successful at accomplishing social goals compared to non-omniscient agents, despite the latter being the more realistic setting. Furthermore, we demonstrate that learning from omniscient simulations improves the apparent naturalness of interactions but scarcely enhances goal achievement in cooperative scenarios. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents.

Motivation

a picture of green colors
Figure 1: An illustration between Script mode simulation and Agents mode simulation. In the Agents mode simulation, two agents, each equipped with an LLM, negotiate and strategically seek information to reach a mutual agreement. Conversely, in Script mode simulation, a single omniscient LLM orchestrates the entire interaction based on full access to the agents‘ goals. While initially appearing efficient, this interaction lacks essential human communication properties.

People navigate everyday social interactions easily despite not having access to other‘s mental states (i.e., information asymmetry). As illustrated in Figure 1, the communication between two agents that are bargaining over a price requires complex interactions for them to understand the interlocutor‘s motive. With modern-day LLMs, simulating such interactions has gotten better. From building a town of AI-powered characters to simulating social media platforms, and training better chatbot systems, LLMs seem to be capable to realistically simulate human social interactions.

However, despite their impressive abilities, one key shortcoming has prevented realistic social simulation: a wide range of prior research has leveraged the omniscient perspective to model and simulate social interactions. By generating all sides of interaction at once or making agent goals transparent to all participants, these simulations diverge from the non-omniscient human interactions that rely on social inference to achieve goals in real-world scenarios. Studying these omniscient simulations could potentially lead to biased or wrong conclusions.

Simulating Society for Analysis

To investigate the effect of this incongruity, we create a unified simulation framework by building on Sotopia. We set up two modes for simulating human interaction in LLMs: Script mode and Agents mode. As shown in Figure 1, in the Script mode, one omniscient LLM has access to all the information and generates the entire dialogue from a third-person perspective. In the Agents mode, two LLMs assume distinct roles and engage in interaction to accomplish the task despite the presence of information asymmetry.

a picture of green colors
Figure 2: Average goal completion score of models across different modes in various settings. Overall contains all the scenarios, and the other two contains representative scenarios from the cooperative and competitive scenarios. We perform pairwise t-test, and * denotes the score is statistical significantly different from the other two modes in this setting. To study the effects of information asymmetry, we add one ablation setting where information asymmetry is removed from the Agents mode simulation by giving each agent access to other characters‘ information (e.g., social goals). This is referred to as the Mind readers.

As shown in Figure 2, there are drastic disparities in each of these modes in terms of achieving social goals and naturalness. The Script mode significantly overestimates the ability of LLM-agents to achieve social goals, while LLM-based agents struggle to act in situations with information asymmetry. Additionally, the Agents mode generates interactions that sound significantly less natural, further highlighting the disparities in these simulation modes.

Simulating Interactions for Training

We then ask the question of whether LLM agents can be learned from Script mode simulations. We finetune GPT-3.5 on a large dataset of interactions generated omnisciently. We find that through finetuning, Agents mode models become more natural yet barely improve in cooperative scenarios with information asymmetry. Further analysis shows that Script mode simulations contain information leakage in cooperative scenarios and tend to produce overly agreeable interlocutors in competitive settings.

Conclusion

Our findings suggest that the success of LLMs in simulating social interactions with Script mode can be misleading. While simulations generated from the third-person perspective of Script mode score highly in terms of goal completion rate and dialogue fluidity, the conversation strategies used by these LMs over-rely on the benefit of having direct access to the internal states of both parties. These artifacts hinder Script mode ability to simulate human-like interaction, and likely lead to an overestimation of the social capabilities of LLMs. Based on our findings, we provide recommendations for reporting LLM-based agent work, encouraging more careful considerations and transparency in using LLMs to simulate social interactions from both data and learning perspectives.