ucb_agentic_ai

Lecture 02: Learning to Self-Improve & Reason with LLMs

Link to lecture recording on YouTube

Date: 2025-02-03

Speaker: Jason Weston

Speaker’s social profile: Company Profile / Google Scholar / GitHub / LinkedIn / X (Twitter)

Work:

Notes

Goal: an AI that “trains” itself as much as possible

Research question: can this help it become superhuman?

When self-improving: two types of reasoning to improve

System Characteristics Example Details
1 reactive and relies on associations LLMs <ul><li>fixed compute per token</li><li>directly outputs answer</li><li>failures: learns spurious / unwanted correlations: hallucination, sycophancy, jailbreaking…</li></ul>
2 more deliberate and effortful multiple “calls” to system 1 LLM <ul><li>planning, search, verifying, reasoning etc.</li><li>dynamic computation (e.g., chain-of-thought, tree-of-thoughts…)</li></ul>

[Incomplete, work in progress]

References