For robots to be useful, they can’t just dance — they must be able to physically interact with the world around them. Unfortunately, the sorts of motion tracking policies you see performing dancing or martial arts are not really capable of the kind of precise, forceful interaction needed to perform useful interactions with the world.
Siheng and Yanjie join us to talk about ResMimic, their new paper which takes a general-purpose human motion-tracking policy and improves it with a residual policy to reliably interact with objects
To learn more, watch Episode #47 of RoboPapers, hosted by Michael Cho and Chris Paxton, today!
Abstract:
Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks. While recent advances in general motion tracking (GMT) have enabled humanoids to reproduce diverse human motions, these policies lack the precision and object awareness required for loco-manipulation. To this end, we introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data. First, a GMT policy, trained on large-scale human-only motion, serves as a task-agnostic base for generating human-like whole-body movements. An efficient but precise residual policy is then learned to refine the GMT outputs to improve locomotion and incorporate object interaction. To further facilitate efficient training, we design (i) a point-cloud-based object tracking reward for smoother optimization, (ii) a contact reward that encourages accurate humanoid body-object interactions, and (iii) a curriculum-based virtual object controller to stabilize early training. We evaluate ResMimic in both simulation and on a real Unitree G1 humanoid. Results show substantial gains in task success, training efficiency, and robustness over strong baselines.
Project Page: https://resmimic.github.io/
ArXiV: https://www.arxiv.org/abs/2510.05070
Original post on X: https://x.com/SihengZhao/status/1975985531298476316










