As robot taxi increasingly share roads with human drivers, creating human-like driving agents has emerged as a promising approach for seamless integration into mixed-autonomy environments. Achieving effective vehicle human interaction is becoming a clear and long-term challenge. A notable direction is leveraging Large Language Models (LLMs) as driving agents, given their human-like reasoning abilities and adaptability to diverse and dynamic driving scenarios. However, a foundational question arises – do LLMs really like human drivers, and to how much content? In this paper, we present a novel framework to evaluate whether LLM- and multimodal LLM-based agents can authentically simulate human driving behavior. To this end, we introduce Hermes Sense, a benchmark dataset grounded in real-world human driving behavior that integrates both sensory and semantic information. Built upon this dataset, our evaluation methodology assesses behavioral alignment from both short-term decision-making and long-term statistical perspectives, quantifying how closely these models reproduce human-like driving decisions under identical conditions and when conditioned on specific driver profiles, respectively. To further illuminate the factors shaping human-like behavior, we conduct a data attribution analysis to identify which components in the input most influence the LLM agents’ behavioral decisions. Thirteen state-of-the-art LLMs from five families were evaluated, revealing notable differences in their ability to mimic human driving behavior. The proposed dataset, benchmark, and feature attribution insights lay a solid foundation for the future optimal integration of information flow in AV and robotics research, fostering a more transparent and reliable human-machine collaborative environment.
Can LLM Agents Drive Like Human Beings? Benchmarks, Feature Attributions, Interpretations and Insights
Summary
Related Publications
- Wang, P., Zhao, Y., Wu, G., Lin, Y., and Yang, H.F., Can LLM Agents Drive Like Human Beings? Benchmarks, Feature Attributions, Interpretations and Insights. Transportation Research Part C: Emerging Technologies, In press.
