ucb_agentic_ai

Lecture 06: Multimodal Autonomous AI Agents

Link to lecture recording on YouTube

Date: 2025-03-10

Speaker: Ruslan Salakhutdinov

Speaker’s social profile: Website / Google Scholar / GitHub / LinkedIn / X (Twitter)

Work:

Notes

Autonomous AI agents: many opportunities to automate menial tasks

Web agents: foundation model + text understanding (HTML) + visual encoder + web grounding

VisualWebArena1

Tree Search2

Internet-Scale3

[Incomplete, work in progress]

References

  1. Jing Yu Koh et al. VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks. arXiv:2401.13649 [cs.LG]. 2024.

  2. Jing Yu Koh et al. Tree Search for Language Model Agents. arXiv:2407.01476 [cs.AI]. 2024.

  3. Brandon Trabucco et al. InSTA: Towards Internet-Scale Training For Agents. arXiv:2502.06776 [cs.LG]. 2025.