ucb_agentic_ai

Lecture 10: Open-Source and Science in the Era of Foundation Models

Link to lecture recording on YouTube

Date: 2024-11-18

Speaker: Percy Liang

Speaker’s Social Profile: Website / Google Scholar / GitHub / LinkedIn / X (Twitter)

Education:

Ph.D. in Computer Science, 2005-2011, University of California, Berkeley, advised by Prof. Michael Jordan and Prof. Dan Klein
M.Eng., 2005, Massachusetts Institute of Technology
B.S., 2000-2004, Massachusetts Institute of Technology

Work:

Associate Professor of Computer Science, Stanford University
Associate Professor (by courtesy) in Statistics, Stanford University

Notes

Over the last few years, the capabilities of foundation models have skyrocketed, but the openness and access to these models has plummeted almost symmetrically
we used to have access to the full paper, weights, code, data, and now all we have is API

Access shapes research

Timeline	Technology	Research
1990s	Internet (text in digital form)	statistical NLP methods
2010s	crowdsourcing platforms e.g., Amazon Mechanical Turk	organic, large scale annotated datasets e.g., ImageNet, SQuAD¹
2010s	GPUs	deep learning methods

Types of access:

Type	Analogy	Opportunity
API access	cognitive scientist: measure behavior of a black box, prompt → response cannot look inside the black box	build agents to solve complex problems
Open-weight access	neuroscientists: probe internal activations	understand mechanisms, create novel derivatives such as distillations and fine-tunes
Open-source	computer scientist: build a system and control every part of it	question everything (dataset, model architecture, training procedure etc.)

API access

think of API as a universal function, can do different things (e.g., summarize, verify, generate) with a prompt of natural language instruction
compose API calls together into larger systems (agents)
important: API is the controller of execution flow (not called by fixed program)

Agent architecture:

agent takes observations from the real world
it perceives them and adds them to a memory stream (records everything that has happened so far)
retrieve the relevant memories which give a piece of context that can be used to do 3 things:
- act in the present: tool use, grounding into the real world, taking various actions
- look backwards (past) and reflect on memories: summarizing, verifying what you have seen so far
- plan for the future: gives agent a sense of long-term direction as opposed to simply reacting

Each of the following is powered by one of these black box LLM APIs: perceive, retrieve, reflect, act, plan

Two types of agents: Problem-solving agents and simulation agents

Problem-solving agents:

Application	Examples	Details	Remarks
Research	MLAgentBench²	Build an LLM agent to do<ul><li>task: build the best machine learning model</li><li>given: a machine learning problem, some data, starter code, evaluator for test accuracy</li><li>human action loop: write and run some code, think and revise based on the what happened</li></ul>Self-improvement: solve task → improve model → solve task better	Related work: MLE-Bench³ AIDE⁴ OpenHands (OpenDevin)⁵ CORE-Bench⁶ Generating novel research ideas⁷
Cybersecurity	Cybench⁸	Agent has access to a server running, the code running on the server, and a Bash shell agent going to read the code, understand and identify a potential security vulnerability, exploit by running Bash command	Reflections: dual implications of cybersecurity agent:<ul><li>quantified evaluation of cyber risk (offense)</li><li>penetration testing tool (defense): identify and break code before deployment</li></ul>

Simulation agents: a virtual world called Smallville⁹

there are 25 agents, each is powered by an LM architecture
retrieve: most experiences are fairly mundane but nonetheless they all go into the memory stream; retrieve based on 3 criteria: recency, importance (marked intrinsically), relevance
reflection: takes groups of memories and abstracts them
simulating social behavior

Simulations of real people¹⁰: interviews capture a tremendous amount of richness

sampled 1000 people with diverse demographics
2-hour audio interview (average 6491 words), scripts drawn from the American Voices Project
each goes through the interview with an LM agent; interview transcript becomes a seed of the agent memory

Reflections of agents and API access:

use API to create agents
solve complex problems in ML engineering and cybersecurity
simulate people (digital twin of society) - lab for social scientists
next: static agents → learn from experiences (AlphaGo analogy: supervised learning → reinforcement learning)

Open-weight access

Open-weight: dual-use foundation models with widely available weights [executive order on the safe, secure and trustworthy]; the license of these models have restrictions

Reproducibility: API models get deprecated once in a while, bad for reproducibility
open-weights can be stored to reproduce experiments done before

Transluce model investigator¹¹: open the hood and look at the high-activation neurons

Research¹² at NVIDIA: take a 15-billion parameter model, prune away some of the layers, and result in 8B and 4B model with minimal drop in performance

Adversarial attacks¹³:

take Llama, look at the gradients through the model (requiring open layer access) and optimize for a prompt that essentially tries to jailbreak (get the model to do malicious things)
take that prompt to GPT-4 and it actually jailbreaks GPT-4
show that accessed open-weight models actually have implications on closed models because of transferability

Model independence tests¹⁴: check whether two models were independently trained or not;
If they are not, was one fine-tuned from another, or were both fine-tuned from a common model

Idea 1: compute the similarity between the two; e.g., cosine similarity of MLP weights
Problem: what is the level of statistical significance, and does this threshold depend on the model architecture?

Idea 2: train a bunch of models { sim( θ₁’, θ₂ ): θ₁’ = train( random init ) }
p-value = P [ sim( θ₁’, θ₂ ) > sim( θ₁, θ₂ ) ]
Problem: impossible to train to get θ₁’ since only have the final weights; furthermore, even if you knew, probably will not spend millions of dollars training the model over again just to do this test

Idea 3: leverage the natural symmetries of the model
perm(θ) = permute the hidden units defined by θ to get counterfactuals p-value = P [ sim( perm(θ₁), θ₂ ) > sim( θ₁, θ₂ ) ]
In addition to the information on whether two models are independent, the test tells which layers were derived from which layer

Reflections of open-weight access:

strong open-weight models (e.g., Llama 3) have been immensely valuable
enables research on interpretability, fine-tuning, distillation, merging (all reproducible)
question: how weight modifications can yield coherent functional changes? No guarantee of this; in general, have to go back and fine tune the model
teaches us about API models (e.g., adversarial attacks transfer)
new problems motivated by open-weights (e.g., model independence testing)
but still confined by the blueprint of existing models

Open-source

There are a lot of open source language model efforts, though at the present moment they are far weaker than the strongest open-weight or API models

Historical context of free and open-source software

roots: hacker ethic (MIT in 1950s) + academia (for centuries)
values: creativity, exploration, transparency, collaboration, resistance against authority

Timeline	Event
1983	Richard Stallman started GNU (bash, ls, …)
1991	Linus Torvalds started Linux
1998	Open-source Initiatives (OSI) - coined and defined “open-source”

An open-source AI is an AI system made available under terms and in a way that grant the freedom to:

use the system for any purpose and without having to ask for permission
study how the system works and inspect its components
modify the system for any purpose, including to change its output
share the system for others to use with or without modifications, for any purpose

Examples of speaker’s research team’s projects about learning algorithm, architecture and data recipe, where training models from scratch is needed:

project 1¹⁵: created a procedure based on distributionally robust optimization (DRO), which automatically tunes these mixtures; this trains a model 2.6 times faster than manually specified weights
project 2¹⁶: developed a new optimizer based on second order information and particularly look at the diagonal Hessian with some clipping
project 3¹⁷: an architecture alternative to a transformer that is not as monolithic and is more interpretable, and allows for precise model editing

Would the results hold if we scaled up? / Where do we get the compute?

operate in a smaller scale model (different architecture, data etc.) with a belief that it is actually going to scale up
there are actually a lot of the idle GPUs; technical challenge: interconnect speed of decentralized compute is at least two orders of magnitude slower than it is in the datacenter (1 Gbps vs. 100 Gbps)
- research¹⁸: with some amount of clever scheduling and some low level systems work, training 1B parameter model is only ~2x slower than in a datacenter; better algorithm¹⁹ developed subsequently
fund the public good: get more compute to fund public academic AI research

Goal: understand data, architecture → model behavior (hard even with full access)

References

Pranav Rajpurkar et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv:1606.05250 [cs.CL]. 2016. ↩
Qian Huang et al. MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation. arXiv:2310.03302 [cs.LG]. 2024. ↩
Jun Shern Chan et al. MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering. arXiv:2410.07095 [cs.CL]. 2024. ↩
Zhengyao Jiang et al. AIDE: AI-Driven Exploration in the Space of Code. arXiv:2502.13138 [cs.AI]. 2025. ↩
Xingyao Wang et al. OpenHands: An Open Platform for AI Software Developers as Generalist Agents. arXiv:2407.16741 [cs.SE]. 2024. ↩
Zachary S. Siegel et al. CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark. arXiv:2409.11363 [cs.CL]. 2024. ↩
Chenglei Si, Diyi Yang, Tatsunori Hashimoto. Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers. arXiv:2409.04109 [cs.CL]. 2024. ↩
Andy K. Zhang et al. Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models. arXiv:2408.08926 [cs.CR]. 2024. ↩
Joon Sung Park et al. Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC]. 2023. ↩
Joon Sung Park et al. Generative Agent Simulations of 1,000 People. arXiv:2411.10109 [cs.AI]. 2024. ↩
Transluce: Monitor: An AI-Driven Observability Interface ↩
Saurav Muralidharan et al. Compact Language Models via Pruning and Knowledge Distillation. arXiv:2407.14679 [cs.CL]. 2024. ↩
Andy Zou et al. Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv:2307.15043 [cs.CL]. 2023. ↩
Sally Zhu et al. Independence Tests for Language Models. arXiv:2502.12292 [cs.LG]. 2025. ↩
Sang Michael Xie et al. DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining. arXiv:2305.10429 [cs.CL]. 2023. ↩
Hong Liu et al. Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training. arXiv:2305.14342 [cs.LG]. 2023. ↩
John Hewitt et al. Backpack Language Models. arXiv:2305.16765 [cs.CL]. 2023. ↩
Binhang Yuan et al. Decentralized Training of Foundation Models in Heterogeneous Environments. arXiv:2206.01288 [cs.DC]. 2022. ↩
Arthur Douillard et al. DiLoCo: Distributed Low-Communication Training of Language Models. arXiv:2311.08105 [cs.LG]. 2023. ↩

This site is open source. Improve this page.