Can LLM Agents Drive Like Human Beings? Benchmarks, Feature Attributions, Interpretations and Insights

Summary

As robot taxi increasingly share roads with human drivers, creating human-like driving agents has emerged as a promising approach for seamless integration into mixed-autonomy environments. Achieving effective vehicle human interaction is becoming a clear and long-term challenge. A notable direction is leveraging Large Language Models (LLMs) as driving agents, given their human-like reasoning abilities and adaptability to diverse and dynamic driving scenarios. However, a foundational question arises – do LLMs really like human drivers, and to how much content? In this paper, we present a novel framework to evaluate whether LLM- and multimodal LLM-based agents can authentically simulate human driving behavior. To this end, we introduce Hermes Sense, a benchmark dataset grounded in real-world human driving behavior that integrates both sensory and semantic information. Built upon this dataset, our evaluation methodology assesses behavioral alignment from both short-term decision-making and long-term statistical perspectives, quantifying how closely these models reproduce human-like driving decisions under identical conditions and when conditioned on specific driver profiles, respectively. To further illuminate the factors shaping human-like behavior, we conduct a data attribution analysis to identify which components in the input most influence the LLM agents’ behavioral decisions. Thirteen state-of-the-art LLMs from five families were evaluated, revealing notable differences in their ability to mimic human driving behavior. The proposed dataset, benchmark, and feature attribution insights lay a solid foundation for the future optimal integration of information flow in AV and robotics research, fostering a more transparent and reliable human-machine collaborative environment.

Related Publications

  1. Wang, P., Zhao, Y., Wu, G., Lin, Y., and Yang, H.F., Can LLM Agents Drive Like Human Beings? Benchmarks, Feature Attributions, Interpretations and Insights. Transportation Research Part C: Emerging Technologies, In press.