Link to lecture recording on YouTube
Date: 2024-11-18
Speaker: Percy Liang
Speaker’s Social Profile: Website / Google Scholar / GitHub / LinkedIn / X (Twitter)
Education:
Work:
Over the last few years, the capabilities of foundation models have skyrocketed, but the openness and access to these models has plummeted almost symmetrically
we used to have access to the full paper, weights, code, data, and now all we have is API
| Timeline | Technology | Research |
|---|---|---|
| 1990s | Internet (text in digital form) | statistical NLP methods |
| 2010s | crowdsourcing platforms e.g., Amazon Mechanical Turk |
organic, large scale annotated datasets e.g., ImageNet, SQuAD1 |
| 2010s | GPUs | deep learning methods |
Types of access:
| Type | Analogy | Opportunity |
|---|---|---|
| API access | cognitive scientist: measure behavior of a black box, prompt → response cannot look inside the black box |
build agents to solve complex problems |
| Open-weight access | neuroscientists: probe internal activations | understand mechanisms, create novel derivatives such as distillations and fine-tunes |
| Open-source | computer scientist: build a system and control every part of it | question everything (dataset, model architecture, training procedure etc.) |
Agent architecture:
Each of the following is powered by one of these black box LLM APIs: perceive, retrieve, reflect, act, plan
Two types of agents: Problem-solving agents and simulation agents
Problem-solving agents:
| Application | Examples | Details | Remarks |
|---|---|---|---|
| Research | MLAgentBench2 | Build an LLM agent to do<ul><li>task: build the best machine learning model</li><li>given: a machine learning problem, some data, starter code, evaluator for test accuracy</li><li>human action loop: write and run some code, think and revise based on the what happened</li></ul>Self-improvement: solve task → improve model → solve task better | Related work: MLE-Bench3 AIDE4 OpenHands (OpenDevin)5 CORE-Bench6 Generating novel research ideas7 |
| Cybersecurity | Cybench8 | Agent has access to a server running, the code running on the server, and a Bash shell agent going to read the code, understand and identify a potential security vulnerability, exploit by running Bash command |
Reflections: dual implications of cybersecurity agent:<ul><li>quantified evaluation of cyber risk (offense)</li><li>penetration testing tool (defense): identify and break code before deployment</li></ul> |
Simulation agents: a virtual world called Smallville9
Simulations of real people10: interviews capture a tremendous amount of richness
Reflections of agents and API access:
Open-weight: dual-use foundation models with widely available weights [executive order on the safe, secure and trustworthy]; the license of these models have restrictions
Reproducibility: API models get deprecated once in a while, bad for reproducibility
open-weights can be stored to reproduce experiments done before
Transluce model investigator11: open the hood and look at the high-activation neurons
Research12 at NVIDIA: take a 15-billion parameter model, prune away some of the layers, and result in 8B and 4B model with minimal drop in performance
Adversarial attacks13:
Model independence tests14: check whether two models were independently trained or not;
If they are not, was one fine-tuned from another, or were both fine-tuned from a common model
Idea 1: compute the similarity between the two; e.g., cosine similarity of MLP weights
Problem: what is the level of statistical significance, and does this threshold depend on the model architecture?
Idea 2: train a bunch of models { sim( θ₁’, θ₂ ): θ₁’ = train( random init ) }
p-value = P [ sim( θ₁’, θ₂ ) > sim( θ₁, θ₂ ) ]
Problem: impossible to train to get θ₁’ since only have the final weights; furthermore, even if you knew, probably will not spend millions of dollars training the model over again just to do this test
Idea 3: leverage the natural symmetries of the model
perm(θ) = permute the hidden units defined by θ to get counterfactuals
p-value = P [ sim( perm(θ₁), θ₂ ) > sim( θ₁, θ₂ ) ]
In addition to the information on whether two models are independent, the test tells which layers were derived from which layer
Reflections of open-weight access:
There are a lot of open source language model efforts, though at the present moment they are far weaker than the strongest open-weight or API models
Historical context of free and open-source software
| Timeline | Event |
|---|---|
| 1983 | Richard Stallman started GNU (bash, ls, …) |
| 1991 | Linus Torvalds started Linux |
| 1998 | Open-source Initiatives (OSI) - coined and defined “open-source” |
An open-source AI is an AI system made available under terms and in a way that grant the freedom to:
Examples of speaker’s research team’s projects about learning algorithm, architecture and data recipe, where training models from scratch is needed:
Would the results hold if we scaled up? / Where do we get the compute?
Goal: understand data, architecture → model behavior (hard even with full access)
Pranav Rajpurkar et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv:1606.05250 [cs.CL]. 2016. ↩
Qian Huang et al. MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation. arXiv:2310.03302 [cs.LG]. 2024. ↩
Jun Shern Chan et al. MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering. arXiv:2410.07095 [cs.CL]. 2024. ↩
Zhengyao Jiang et al. AIDE: AI-Driven Exploration in the Space of Code. arXiv:2502.13138 [cs.AI]. 2025. ↩
Xingyao Wang et al. OpenHands: An Open Platform for AI Software Developers as Generalist Agents. arXiv:2407.16741 [cs.SE]. 2024. ↩
Zachary S. Siegel et al. CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark. arXiv:2409.11363 [cs.CL]. 2024. ↩
Chenglei Si, Diyi Yang, Tatsunori Hashimoto. Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers. arXiv:2409.04109 [cs.CL]. 2024. ↩
Andy K. Zhang et al. Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models. arXiv:2408.08926 [cs.CR]. 2024. ↩
Joon Sung Park et al. Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC]. 2023. ↩
Joon Sung Park et al. Generative Agent Simulations of 1,000 People. arXiv:2411.10109 [cs.AI]. 2024. ↩
Transluce: Monitor: An AI-Driven Observability Interface ↩
Saurav Muralidharan et al. Compact Language Models via Pruning and Knowledge Distillation. arXiv:2407.14679 [cs.CL]. 2024. ↩
Andy Zou et al. Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv:2307.15043 [cs.CL]. 2023. ↩
Sally Zhu et al. Independence Tests for Language Models. arXiv:2502.12292 [cs.LG]. 2025. ↩
Sang Michael Xie et al. DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining. arXiv:2305.10429 [cs.CL]. 2023. ↩
Hong Liu et al. Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training. arXiv:2305.14342 [cs.LG]. 2023. ↩
John Hewitt et al. Backpack Language Models. arXiv:2305.16765 [cs.CL]. 2023. ↩
Binhang Yuan et al. Decentralized Training of Foundation Models in Heterogeneous Environments. arXiv:2206.01288 [cs.DC]. 2022. ↩
Arthur Douillard et al. DiLoCo: Distributed Low-Communication Training of Language Models. arXiv:2311.08105 [cs.LG]. 2023. ↩