International Journal of Control, Automation, and Systems 2024; 22(8): 2341-2384
https://doi.org/10.1007/s12555-024-0438-7
© The International Journal of Control, Automation, and Systems
The advancement of foundation models, such as large language models (LLMs), vision-language models (VLMs), diffusion models, and robotics foundation models (RFMs), has become a new paradigm in robotics by offering innovative approaches to the long-standing challenge of building robot autonomy. These models enable the development of robotic agents that can independently understand and reason about semantic contexts, plan actions, physically interact with surroundings, and adapt to new environments and untrained tasks. This paper presents a comprehensive and systematic survey of recent advancements in applying foundation models to robot perception, planning, and control. It introduces the key concepts and terminology associated with foundation models, providing a clear understanding for researchers in robotics and control engineering. The relevant studies are categorized based on how foundation models are utilized in various elements of robotic autonomy, focusing on 1) perception and situational awareness: object detection and classification, semantic understanding, mapping, and navigation; 2) decision making and task planning: mission understanding, task decomposition and coordination, planning with symbolic and learning-based approaches, plan validation and correction, and LLM-robot interaction; 3) motion planning and control: motion planning, control command and reward generation, and trajectory generation and optimization with diffusion models. Furthermore, the survey covers essential environmental setups, including realworld and simulation datasets and platforms used in training and validating these models. It concludes with a discussion on current challenges such as robustness, explainability, data scarcity, and real-time performance, and highlights promising future directions, including retrieval augmented generation, on-device foundation models, and explainability. This survey aims to systematically summarize the latest research trends in applying foundation models to robotics, bridging the gap between the state-of-the-art in artificial intelligence and robotics. By sharing knowledge and resources, this survey is expected to foster the introduction of a new research paradigm for building generalized and autonomous robots.
Keywords Decision making, foundation models, large language models (LLMs), motion planning, perception, robotic autonomy, task planning, vision-language models (VLMs).
International Journal of Control, Automation, and Systems 2024; 22(8): 2341-2384
Published online August 1, 2024 https://doi.org/10.1007/s12555-024-0438-7
Copyright © The International Journal of Control, Automation, and Systems.
Dae-Sung Jang, Doo-Hyun Cho, Woo-Cheol Lee, Seung-Keol Ryu, Byeongmin Jeong, Minji Hong, Minjo Jung, Minchae Kim, Minjoon Lee, SeungJae Lee, and Han-Lim Choi*
KAIST
The advancement of foundation models, such as large language models (LLMs), vision-language models (VLMs), diffusion models, and robotics foundation models (RFMs), has become a new paradigm in robotics by offering innovative approaches to the long-standing challenge of building robot autonomy. These models enable the development of robotic agents that can independently understand and reason about semantic contexts, plan actions, physically interact with surroundings, and adapt to new environments and untrained tasks. This paper presents a comprehensive and systematic survey of recent advancements in applying foundation models to robot perception, planning, and control. It introduces the key concepts and terminology associated with foundation models, providing a clear understanding for researchers in robotics and control engineering. The relevant studies are categorized based on how foundation models are utilized in various elements of robotic autonomy, focusing on 1) perception and situational awareness: object detection and classification, semantic understanding, mapping, and navigation; 2) decision making and task planning: mission understanding, task decomposition and coordination, planning with symbolic and learning-based approaches, plan validation and correction, and LLM-robot interaction; 3) motion planning and control: motion planning, control command and reward generation, and trajectory generation and optimization with diffusion models. Furthermore, the survey covers essential environmental setups, including realworld and simulation datasets and platforms used in training and validating these models. It concludes with a discussion on current challenges such as robustness, explainability, data scarcity, and real-time performance, and highlights promising future directions, including retrieval augmented generation, on-device foundation models, and explainability. This survey aims to systematically summarize the latest research trends in applying foundation models to robotics, bridging the gap between the state-of-the-art in artificial intelligence and robotics. By sharing knowledge and resources, this survey is expected to foster the introduction of a new research paradigm for building generalized and autonomous robots.
Keywords: Decision making, foundation models, large language models (LLMs), motion planning, perception, robotic autonomy, task planning, vision-language models (VLMs).
Vol. 22, No. 9, pp. 2673~2953
Jiangping Wang, Shirong Liu*, Botao Zhang, and Changbin Yu
International Journal of Control, Automation and Systems 2021; 19(3): 1340-1351Anugrah K. Pamosoaji, Mingxu Piao, and Keum-Shik Hong*
International Journal of Control, Automation and Systems 2019; 17(10): 2610-2623Shlomi Hacohen, Shraga Shoval, and Nir Shvalb*
International Journal of Control, Automation and Systems 2019; 17(8): 2097-2113