ucb_agentic_ai

Lecture 10: AlphaStar Revisited: Multi-Agent Systems in the Era of LLMs

Link to lecture recording on YouTube

Date: 2025-11-17

Speaker: Oriol Vinyals

Speaker’s Social Profile: Company Profile / Google Scholar / GitHub / LinkedIn / X (Twitter)

Education:

Work:

Notes

Agent Basics & AlphaStar

Goal of reinforcement learning: select actions to maximize future rewards

A curriculum of difficulty of games (increase complexity from top to bottom):

Game Information type Players Action space Moves per game
Atari near-perfect single player 17 100’s
GO Perfect multiplayer 361 100’s
StarCraft Imperfect multiplayer ~1026 1000’s

AlphaStar played non-anonymously vs. attendees with standard gaming mouse / keyboard at BlizzCon (Nov 19). Link to video of AlphaStar vs. Serral.

AlphaStar & LLMs

Pretraining: supervised learning - learning from human data

Post-training: reinforcement learning - happens in a massive multi-agent system

Autoregressive mode: a little recurrent neural network that was structured so that all these arguments are lining up with different softmaxes

[Incomplete, work in progress]

References

  1. Pointer Networks