Survey Paper

International Journal of Control, Automation, and Systems 2024; 22(8): 2341-2384

https://doi.org/10.1007/s12555-024-0438-7

© The International Journal of Control, Automation, and Systems

Unlocking Robotic Autonomy: A Survey on the Applications of Foundation Models

Dae-Sung Jang, Doo-Hyun Cho, Woo-Cheol Lee, Seung-Keol Ryu, Byeongmin Jeong, Minji Hong, Minjo Jung, Minchae Kim, Minjoon Lee, SeungJae Lee, and Han-Lim Choi*

KAIST

Abstract

The advancement of foundation models, such as large language models (LLMs), vision-language models (VLMs), diffusion models, and robotics foundation models (RFMs), has become a new paradigm in robotics by offering innovative approaches to the long-standing challenge of building robot autonomy. These models enable the development of robotic agents that can independently understand and reason about semantic contexts, plan actions, physically interact with surroundings, and adapt to new environments and untrained tasks. This paper presents a comprehensive and systematic survey of recent advancements in applying foundation models to robot perception, planning, and control. It introduces the key concepts and terminology associated with foundation models, providing a clear understanding for researchers in robotics and control engineering. The relevant studies are categorized based on how foundation models are utilized in various elements of robotic autonomy, focusing on 1) perception and situational awareness: object detection and classification, semantic understanding, mapping, and navigation; 2) decision making and task planning: mission understanding, task decomposition and coordination, planning with symbolic and learning-based approaches, plan validation and correction, and LLM-robot interaction; 3) motion planning and control: motion planning, control command and reward generation, and trajectory generation and optimization with diffusion models. Furthermore, the survey covers essential environmental setups, including realworld and simulation datasets and platforms used in training and validating these models. It concludes with a discussion on current challenges such as robustness, explainability, data scarcity, and real-time performance, and highlights promising future directions, including retrieval augmented generation, on-device foundation models, and explainability. This survey aims to systematically summarize the latest research trends in applying foundation models to robotics, bridging the gap between the state-of-the-art in artificial intelligence and robotics. By sharing knowledge and resources, this survey is expected to foster the introduction of a new research paradigm for building generalized and autonomous robots.

Keywords Decision making, foundation models, large language models (LLMs), motion planning, perception, robotic autonomy, task planning, vision-language models (VLMs).

Article

Survey Paper

International Journal of Control, Automation, and Systems 2024; 22(8): 2341-2384

Published online August 1, 2024 https://doi.org/10.1007/s12555-024-0438-7

Copyright © The International Journal of Control, Automation, and Systems.

Unlocking Robotic Autonomy: A Survey on the Applications of Foundation Models

Dae-Sung Jang, Doo-Hyun Cho, Woo-Cheol Lee, Seung-Keol Ryu, Byeongmin Jeong, Minji Hong, Minjo Jung, Minchae Kim, Minjoon Lee, SeungJae Lee, and Han-Lim Choi*

KAIST

Abstract

The advancement of foundation models, such as large language models (LLMs), vision-language models (VLMs), diffusion models, and robotics foundation models (RFMs), has become a new paradigm in robotics by offering innovative approaches to the long-standing challenge of building robot autonomy. These models enable the development of robotic agents that can independently understand and reason about semantic contexts, plan actions, physically interact with surroundings, and adapt to new environments and untrained tasks. This paper presents a comprehensive and systematic survey of recent advancements in applying foundation models to robot perception, planning, and control. It introduces the key concepts and terminology associated with foundation models, providing a clear understanding for researchers in robotics and control engineering. The relevant studies are categorized based on how foundation models are utilized in various elements of robotic autonomy, focusing on 1) perception and situational awareness: object detection and classification, semantic understanding, mapping, and navigation; 2) decision making and task planning: mission understanding, task decomposition and coordination, planning with symbolic and learning-based approaches, plan validation and correction, and LLM-robot interaction; 3) motion planning and control: motion planning, control command and reward generation, and trajectory generation and optimization with diffusion models. Furthermore, the survey covers essential environmental setups, including realworld and simulation datasets and platforms used in training and validating these models. It concludes with a discussion on current challenges such as robustness, explainability, data scarcity, and real-time performance, and highlights promising future directions, including retrieval augmented generation, on-device foundation models, and explainability. This survey aims to systematically summarize the latest research trends in applying foundation models to robotics, bridging the gap between the state-of-the-art in artificial intelligence and robotics. By sharing knowledge and resources, this survey is expected to foster the introduction of a new research paradigm for building generalized and autonomous robots.

Keywords: Decision making, foundation models, large language models (LLMs), motion planning, perception, robotic autonomy, task planning, vision-language models (VLMs).

IJCAS
December 2024

Vol. 22, No. 12, pp. 3545~3811

Stats or Metrics

Share this article on

  • line

Related articles in IJCAS

IJCAS

eISSN 2005-4092
pISSN 1598-6446