Ep#38: Q Learning is Not Yet Scalable

Playback speed

Share post at current time

0:00

Transcript

Ep#38: Q Learning is Not Yet Scalable

With Seohong Park

Chris Paxton

and

Michael Cho

Oct 24, 2025

Offline reinforcement learning is crucial for robotics, but does it scale? We talk to Seohong, who discusses how for long-horizon manipulation problems the answer may be no — at least not yet. But there are tricks that you can use to make it work effectively.

Watch episode #38 of RoboPapers with Michael Cho and Chris Paxton now!

Abstract:

In this work, we study the scalability of offline reinforcement learning (RL) algorithms. In principle, a truly scalable offline RL algorithm should be able to solve any given problem, regardless of its complexity, given sufficient data, compute, and model capacity. We investigate if and how current offline RL algorithms match up to this promise on diverse, challenging, previously unsolved tasks, using datasets up to 1000x larger than typical offline RL datasets. We observe that despite scaling up data, many existing offline RL algorithms exhibit poor scaling behavior, saturating well below the maximum performance. We hypothesize that the horizon is the main cause behind the poor scaling of offline RL. We empirically verify this hypothesis through several analysis experiments, showing that long horizons indeed present a fundamental barrier to scaling up offline RL. We then show that various horizon reduction techniques substantially enhance scalability on challenging tasks. Based on our insights, we also introduce a minimal yet scalable method named SHARSA that effectively reduces the horizon. SHARSA achieves the best asymptotic performance and scaling behavior among our evaluation methods, showing that explicitly reducing the horizon unlocks the scalability of offline RL. Code: this https URL

And from the blog post:

Over the past few years, we’ve seen that next-token prediction scales, denoising diffusion scales, contrastive learning scales, and so on, all the way to the point where we can train models with billions of parameters with a scalable objective that can eat up as much data as we can throw at it. Then, what about reinforcement learning (RL)? Does RL also scale like all the other objectives?

ArXiV

Blog Post

RoboPapers

Ep#38: Q Learning is Not Yet Scalable

Discussion about this video