ucb_agentic_ai

Lecture 10: AlphaStar Revisited: Multi-Agent Systems in the Era of LLMs

Link to lecture recording on YouTube

Date: 2025-11-17

Speaker: Oriol Vinyals

Education:

Ph.D. in Computer Science, 2009-2013, University of California, Berkeley
M.Sc. in Computer Science, 2008-2009, University of California, San Diego
Bachelor’s degree in Mathematics, 2001-2007, Polytechnic University of Catalonia (Universitat Politècnica de Catalunya)

Work:

Goal of reinforcement learning: select actions to maximize future rewards

A curriculum of difficulty of games (increase complexity from top to bottom):

Game	Information type	Players	Action space	Moves per game
Atari	near-perfect	single player	17	100’s
GO	Perfect	multiplayer	361	100’s
StarCraft	Imperfect	multiplayer	~10²⁶	1000’s

AlphaStar played non-anonymously vs. attendees with standard gaming mouse / keyboard at BlizzCon (Nov 19). Link to video of AlphaStar vs. Serral.

Pretraining: supervised learning - learning from human data

Post-training: reinforcement learning - happens in a massive multi-agent system

Autoregressive mode: a little recurrent neural network that was structured so that all these arguments are lining up with different softmaxes

fully autoregressive action head with 7 sub-heads: $p(x) = \prod_{i=1}^{n} p(x_{i} x_{1}, \dots,x_{i-1}) $
four scalar heads: action type, action delay, action repeat, modifier key
a recurrent pointer network to select a set of units¹
a simple pointer network to select single units
a ResNet decoder to select points on the map

[Incomplete, work in progress]

This site is open source. Improve this page.