ucb_agentic_ai

Lecture 05: Some Challenges and Lessons from Training Agentic Models

Link to lecture recording on YouTube

Date: 2025-10-13

Speaker: Weizhu Chen

Speaker’s Social Profile: Company Profile / Google Scholar / GitHub / LinkedIn / X (Twitter)

Education:

Work:

Notes

Important aspects of agentic training:

Lessons from experience in the industry

RL data:

  1. verifiable
    • math
    • code
  2. non-verifiable:
    • open and subjective data: style, writing, safety
    • rubrics: much more complicated than most people think
    • data synthesis

Rubrics: scorable with steerability

Data efficiency: RLVR (Reinforcement Learning with Verifiable Rewards) with one example
observation: after ~100 steps, the accuracy of RLVR with one example (~30% accuracy) is getting closer to training with 1200 examples (~36% accuracy)

Data mix: curating high quality data often outperforms alchemy in parameter tuning for the training

Tips Examples
Hard problems are usually more useful for powerful models put in more of the easier data at the beginning
mix in more difficult data when training moves forward and the model becomes stronger
The goodness of data is also model dependent for a coding model, people also ask non-coding questions such as write a document based on the code
Combine the use of real data and synthetic data; real data help in real cases, while synthetic data can be formalized in multiple styles synthesize more data if the model is good at some categories but not others
Use powerful models as judger to generate more data  

[Incomplete, work in progress]

References