English

具身智能导论Introduction to Embodied AI

课程号: 04834020
学分: 3
先修课程: 计算机视觉,深度学习
开课院系: 信息科学技术学院
中文简介: 随着深度学习和大模型技术的快速发展,人工智能技术在视觉感知和场景理解上取得了重要突破,而与物理世界交互的智能,比如如何抓各种各样的物体、如何让机器人学会走不平坦的路,变成了走向通用智能的瓶颈。具身智能正是研究这种基于物理身体进行感知和行动的智能系统,近年受到了的学术界和工业界的广泛关注。与依赖互联网数据进行学习的传统范式不同,具身智能体可以通过身体与环境的物理交互获取信息、理解问题、做出决策并实现行动,从而产生智能行为、自主性和适应性。因此具身智能被广泛认为是通向通用人工智能必不可少的研究范式,是推动人形机器人、四足机器人等各种机器人能力演进的关键研究领域,也是机器人学、机器学习、计算机视觉、自然语言、计算机图形学等多学科交叉的前沿交叉领域。 本课程将覆盖基础的机器人学内容打好具身智能的基础。然后,从机器人的视觉模态--三维视觉讲解通过深度学习实现模块化的机器人系统并实现泛化抓取。而后基于深度学习的、强化模仿学习等一系列手段介绍端到端的足式机器人行走、灵巧手操作等一系列具身智能任务。介于大模型对具身智能体通用感知和语言交互的重要性,课程将介绍预训练大模型CLIP及GPT-4V/4o等多模态大模型系统,并介绍如何打造基于模块化或者端到端的具身智能大模型。最后,课程还会提供人形机器人或足式机器人的真机实验机会,让学生有机会将作业内容放到真实机器人上部署,深度理解Sim2Real的技术和真机的挑战。总的来说,课程将覆盖基于视觉的机器人控制与交互中的各种重要任务和方法,致力于对这一前沿领域进行有深度和广度的探讨。
英文简介: With the rapid development of deep learning and large model technologies, artificial intelligence has made significant breakthroughs in visual perception and scene understanding. However, the intelligence that handles the interaction with the physical world—such as those that grasp various objects or enable robots to navigate uneven terrain—has become a bottleneck in the path toward general intelligence. Embodied intelligence, which studies intelligent systems that perceive and act through a physical body, has recently garnered significant attention from academia and industry. Unlike traditional research paradigms that rely on internet data for learning, embodied agents can acquire information, understand problems, make decisions, and take actions through physical interaction with their environment, thus exhibiting intelligent behavior, autonomy, and adaptability. Therefore, embodied intelligence is widely regarded as an essential research paradigm for advancing general AI. It is key to evolving the capabilities of humanoid and quadruped robots and is a frontier interdisciplinary field combining robotics, machine learning, computer vision, natural language processing, and computer graphics. This course will cover foundational robotics topics, presenting a range of embodied intelligence tasks, such as legged robots and dexterous hand manipulation, using methods like 3D vision based on deep learning and reinforcement imitation learning. It will also introduce multimodal large models like GPT-4V/4o and explore how to build modular or end-to-end embodied intelligence models. The course will offer real-world experimental opportunities with humanoid or legged robots, allowing students to deploy their assignments on real robots and gain a deep understanding of Sim2Real techniques and the challenges of working with physical systems. Overall, the course will cover various essential tasks and methods in vision-based robotic control and interaction, aiming for an in-depth and broad exploration of this cutting-edge field.