Ep#043: Attention-based map encoding for learning generalized legged locomotion

Playback speed

Share post at current time

0:00

Transcript

Ep#043: Attention-based map encoding for learning generalized legged locomotion

With Chong Zhang

Chris Paxton

and

Michael Cho

Nov 20, 2025

Walking robots can do all kinds of exciting things like dancing, running, and martial arts — but for them to be useful, they must be able to use their legs to handle terrain, to move over obstacles not just around them. So, how can we train walking policies for legged robots that are useful?

Unlike with manipulation, these policies are trained with end-to-end, sim-to-real reinforcement learning, using attention. Turns out maybe “attention is all you need” also applies to locomotion. Chong Zhang joins us to explain more.

Watch Episode #43 of RoboPapers, hosted by Michael Cho and Chris Paxton, now, to find out more.

Abstract:

Dynamic locomotion of legged robots is a critical yet challenging topic in expanding the operational range of mobile robots. It requires precise planning when possible footholds are sparse, robustness against uncertainties and disturbances, and generalizability across diverse terrains. Although traditional model-based controllers excel at planning on complex terrains, they struggle with real-world uncertainties. Learning-based controllers offer robustness to such uncertainties but often lack precision on terrains with sparse steppable areas. Hybrid methods achieve enhanced robustness on sparse terrains by combining both methods but are computationally demanding and constrained by the inherent limitations of model-based planners. To achieve generalized legged locomotion on diverse terrains while preserving the robustness of learning-based controllers, this paper proposes an attention-based map encoding conditioned on robot proprioception, which is trained as part of the controller using reinforcement learning. We show that the network learns to focus on steppable areas for future footholds when the robot dynamically navigates diverse and challenging terrains. We synthesized behaviors that exhibited robustness against uncertainties while enabling precise and agile traversal of sparse terrains. In addition, our method offers a way to interpret the topographical perception of a neural network. We have trained two controllers for a 12-degrees-of-freedom quadrupedal robot and a 23-degrees-of-freedom humanoid robot and tested the resulting controllers in the real world under various challenging indoor and outdoor scenarios, including ones unseen during training.

Paper in Science Robotics

ArXiV

RoboPapers

Ep#043: Attention-based map encoding for learning generalized legged locomotion

Discussion about this video