Blog

All posts

July 8, 2026

Why I Am Studying Classic Reinforcement Learning Now

Modern AI is rediscovering old reinforcement learning problems: reward design, credit assignment, exploration, evaluation, and learning from feedback.

AIRLRLVRReinforcement Learning

May 27, 2026

GRPO: Learning From the Other Answers in the Room

A simple breakdown of GRPO, a reinforcement learning method that removes the need for a separate value model. Full of analogies, and images to make the concept more approachable.

GRPOPPORLVRReinforcement Learning

May 12, 2026

What Longer-Timeline Intuitions About RL Progress Missed

An argument for why AI progress did not slow in the RL regime as much as some longer-timeline intuitions expected.

AIRLForecasting

May 3, 2026

PPO Explain for beginners

A beginner-friendly breakdown of Proximal Policy Optimization — the RL algorithm that turned raw base models into useful AI assistants.

AIRLMachineLearningGPU

April 29, 2026

Hello World

My first blog post — a quick intro to what I'll be writing about.

PersonalWriting