ucb_agentic_ai

Lecture 10: Open-Source and Science in the Era of Foundation Models

Link to lecture recording on YouTube

Date: 2024-11-18

Speaker: Percy Liang

Speaker’s Social Profile: Website / Google Scholar / GitHub / LinkedIn / X (Twitter)

Education:

Work:

Notes

Over the last few years, the capabilities of foundation models have skyrocketed, but the openness and access to these models has plummeted almost symmetrically
we used to have access to the full paper, weights, code, data, and now all we have is API

Access shapes research

Timeline Technology Research
1990s Internet (text in digital form) statistical NLP methods
2010s crowdsourcing platforms
e.g., Amazon Mechanical Turk
organic, large scale annotated datasets
e.g., ImageNet, SQuAD1
2010s GPUs deep learning methods

Types of access:

Type Analogy Opportunity
API access cognitive scientist: measure behavior of a black box, prompt → response
cannot look inside the black box
build agents to solve complex problems
Open-weight access neuroscientists: probe internal activations understand mechanisms, create novel derivatives such as distillations and fine-tunes
Open-source computer scientist: build a system and control every part of it question everything (dataset, model architecture, training procedure etc.)

API access

Agent architecture:

Each of the following is powered by one of these black box LLM APIs: perceive, retrieve, reflect, act, plan

Two types of agents: Problem-solving agents and simulation agents

Problem-solving agents:

Application Examples Details Remarks
Research MLAgentBench2 Build an LLM agent to do<ul><li>task: build the best machine learning model</li><li>given: a machine learning problem, some data, starter code, evaluator for test accuracy</li><li>human action loop: write and run some code, think and revise based on the what happened</li></ul>Self-improvement: solve task → improve model → solve task better Related work:
MLE-Bench3
AIDE4
OpenHands (OpenDevin)5
CORE-Bench6
Generating novel research ideas7
Cybersecurity Cybench8 Agent has access to a server running, the code running on the server, and a Bash shell
agent going to read the code, understand and identify a potential security vulnerability, exploit by running Bash command
Reflections: dual implications of cybersecurity agent:<ul><li>quantified evaluation of cyber risk (offense)</li><li>penetration testing tool (defense): identify and break code before deployment</li></ul>

Simulation agents: a virtual world called Smallville9

Simulations of real people10: interviews capture a tremendous amount of richness

Reflections of agents and API access:

Open-weight access

Open-weight: dual-use foundation models with widely available weights [executive order on the safe, secure and trustworthy]; the license of these models have restrictions

Reproducibility: API models get deprecated once in a while, bad for reproducibility
open-weights can be stored to reproduce experiments done before

Transluce model investigator11: open the hood and look at the high-activation neurons

Research12 at NVIDIA: take a 15-billion parameter model, prune away some of the layers, and result in 8B and 4B model with minimal drop in performance

Adversarial attacks13:

Model independence tests14: check whether two models were independently trained or not;
If they are not, was one fine-tuned from another, or were both fine-tuned from a common model

Idea 1: compute the similarity between the two; e.g., cosine similarity of MLP weights
Problem: what is the level of statistical significance, and does this threshold depend on the model architecture?

Idea 2: train a bunch of models { sim( θ₁’, θ₂ ): θ₁’ = train( random init ) }
p-value = P [ sim( θ₁’, θ₂ ) > sim( θ₁, θ₂ ) ]
Problem: impossible to train to get θ₁’ since only have the final weights; furthermore, even if you knew, probably will not spend millions of dollars training the model over again just to do this test

Idea 3: leverage the natural symmetries of the model
perm(θ) = permute the hidden units defined by θ to get counterfactuals p-value = P [ sim( perm(θ₁), θ₂ ) > sim( θ₁, θ₂ ) ]
In addition to the information on whether two models are independent, the test tells which layers were derived from which layer

Reflections of open-weight access:

Open-source

There are a lot of open source language model efforts, though at the present moment they are far weaker than the strongest open-weight or API models

Historical context of free and open-source software

Timeline Event
1983 Richard Stallman started GNU (bash, ls, …)
1991 Linus Torvalds started Linux
1998 Open-source Initiatives (OSI) - coined and defined “open-source”

An open-source AI is an AI system made available under terms and in a way that grant the freedom to:

Examples of speaker’s research team’s projects about learning algorithm, architecture and data recipe, where training models from scratch is needed:

Would the results hold if we scaled up? / Where do we get the compute?

Goal: understand data, architecture → model behavior (hard even with full access)

References

  1. Pranav Rajpurkar et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv:1606.05250 [cs.CL]. 2016. 

  2. Qian Huang et al. MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation. arXiv:2310.03302 [cs.LG]. 2024. 

  3. Jun Shern Chan et al. MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering. arXiv:2410.07095 [cs.CL]. 2024. 

  4. Zhengyao Jiang et al. AIDE: AI-Driven Exploration in the Space of Code. arXiv:2502.13138 [cs.AI]. 2025. 

  5. Xingyao Wang et al. OpenHands: An Open Platform for AI Software Developers as Generalist Agents. arXiv:2407.16741 [cs.SE]. 2024. 

  6. Zachary S. Siegel et al. CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark. arXiv:2409.11363 [cs.CL]. 2024. 

  7. Chenglei Si, Diyi Yang, Tatsunori Hashimoto. Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers. arXiv:2409.04109 [cs.CL]. 2024. 

  8. Andy K. Zhang et al. Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models. arXiv:2408.08926 [cs.CR]. 2024. 

  9. Joon Sung Park et al. Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC]. 2023. 

  10. Joon Sung Park et al. Generative Agent Simulations of 1,000 People. arXiv:2411.10109 [cs.AI]. 2024. 

  11. Transluce: Monitor: An AI-Driven Observability Interface 

  12. Saurav Muralidharan et al. Compact Language Models via Pruning and Knowledge Distillation. arXiv:2407.14679 [cs.CL]. 2024. 

  13. Andy Zou et al. Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv:2307.15043 [cs.CL]. 2023. 

  14. Sally Zhu et al. Independence Tests for Language Models. arXiv:2502.12292 [cs.LG]. 2025. 

  15. Sang Michael Xie et al. DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining. arXiv:2305.10429 [cs.CL]. 2023. 

  16. Hong Liu et al. Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training. arXiv:2305.14342 [cs.LG]. 2023. 

  17. John Hewitt et al. Backpack Language Models. arXiv:2305.16765 [cs.CL]. 2023. 

  18. Binhang Yuan et al. Decentralized Training of Foundation Models in Heterogeneous Environments. arXiv:2206.01288 [cs.DC]. 2022. 

  19. Arthur Douillard et al. DiLoCo: Distributed Low-Communication Training of Language Models. arXiv:2311.08105 [cs.LG]. 2023.