ucb_agentic_ai

Lecture 04: Open Training Recipes for Reasoning in Language Models

Link to lecture recording on YouTube

Date: 2025-02-24

Speaker: Hannaneh Hajishirzi

Speaker’s social profile: Website / University Profile / Google Scholar / GitHub / LinkedIn / X (Twitter)

Education:

Ph.D. in Computer Science, 2011, University of Illinois at Urbana-Champaign
B.Sc. in Computer Engineering (Software), Sharif University of Technology

Work:

Professor in Computer Science & Engineering, University of Washington
Senior Director of AI, Allen Institute for AI (AI2)

Notes

AI is here today due to open scientific practices and fully open models (transparent, reproducible, accessible); need to make a lot of advances to push language models beyond language, and mitigate their biases and risk
Project OLMo: fully open ecosystem to develop, study and advance LMs; open documented and reproducible

Stage	Tools
Pre-training	OLMo OLMo₂ OLMoE Dolma
Post-training	Tulu OLMo-Instruct
Test-time Scaling	S1 Open Scholar Self-RAG

Pre-training:

predict the next word in various contexts
data are large scale, usually from the web
model comes out of this stage is not ready to be used: not safe, unable to follow human instructions, not good at reasoning

Post-training:

align with human preferences
integrate with tool use, reason, follow instructions

Post-training

Pre-training

Test-time Inference

[Incomplete, work in progress]

References

This site is open source. Improve this page.