ucb_agentic_ai

Lecture 02: Learning to Self-Improve & Reason with LLMs

Link to lecture recording on YouTube

Date: 2025-02-03

Speaker: Jason Weston

Work:

Goal: an AI that “trains” itself as much as possible

Research question: can this help it become superhuman?

When self-improving: two types of reasoning to improve

System	Characteristics	Example	Details
1	reactive and relies on associations	LLMs	<ul><li>fixed compute per token</li><li>directly outputs answer</li><li>failures: learns spurious / unwanted correlations: hallucination, sycophancy, jailbreaking…</li></ul>
2	more deliberate and effortful	multiple “calls” to system 1 LLM	<ul><li>planning, search, verifying, reasoning etc.</li><li>dynamic computation (e.g., chain-of-thought, tree-of-thoughts…)</li></ul>

[Incomplete, work in progress]

This site is open source. Improve this page.