ucb_agentic_ai

Lecture 07: Multimodal Agents - From Perception to Action

Link to lecture recording on YouTube

Date: 2025-03-17

Speaker: Caiming Xiong

Speaker’s social profile: Website / Google Scholar / GitHub / LinkedIn / X (Twitter)

Education:

Work:

Notes

We have powerful frontier foundation models whose intelligence grows rapidly, even surpassing humans

Multimodal agents (e.g., coding agents, web agents, physical agents):

Environment / Benchmark: should be reconfigurable and expandable

Data: diverse modalities, large-scale, covering a wide range of tasks

Model / System: unified vision-language-reasoning-action model, and long-context inference

[Incomplete, work in progress]

References