ucb_agentic_ai

Lecture 07: Multimodal Agents - From Perception to Action

Link to lecture recording on YouTube

Date: 2025-03-17

Speaker: Caiming Xiong

Speaker’s social profile: Website / Google Scholar / GitHub / LinkedIn / X (Twitter)

Education:

Ph.D. in Computer Science and Engineering, 2008-2014, State University of New York at Buffalo
B.S. and M.S. in Computer Science, 2001-2007, Huazhong University of Science and Technology

Work:

We have powerful frontier foundation models whose intelligence grows rapidly, even surpassing humans

Multimodal agents (e.g., coding agents, web agents, physical agents):

computer tasks often involve multiple apps and interfaces
powered by advancements in large vision-language-action models (VLA-Ms)
make digital interactions more accessible and vastly increase human productivity

[Incomplete, work in progress]

This site is open source. Improve this page.