Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Solving the decision problem

ai agents are local first clients

sync engines finally have a killer app

5 days ago 6 votes
Full Stack AI Agents

a UI for every man, woman, child, and ai agent

2 weeks ago 6 votes
Durable Objects Callbacks are Weird

but it's also convenient to solve human-in-the-loop for ai agents

3 weeks ago 6 votes
Reliable UX for AI chat with Durable Objects

What is says on the tin.

3 weeks ago 7 votes

More in AI

Using LLMs effectively isn't about prompting

When people talk about using language models effectively they mainly talk about prompting: sharing great prompts, or lists of tips for…

15 hours ago 2 votes
On OpenAI's Model Spec 2.0

OpenAI made major revisions to their Model Spec.

an hour ago 2 votes
GenAI in two words: ”Success Theater”

Success Theater

20 hours ago 2 votes
Reinforcement Learning Triples Spot’s Running Speed

About a year ago, Boston Dynamics released a research version of its Spot quadruped robot, which comes with a low-level application programming interface (API) that allows direct control of Spot’s joints. Even back then, the rumor was that this API unlocked some significant performance improvements on Spot, including a much faster running speed. That rumor came from the Robotics and AI (RAI) Institute, formerly The AI Institute, formerly the Boston Dynamics AI Institute, and if you were at Marc Raibert’s talk at the ICRA@40 conference in Rotterdam last fall, you already know that it turned out not to be a rumor at all. Today, we’re able to share some of the work that the RAI Institute has been doing to apply reality-grounded reinforcement learning techniques to enable much higher performance from Spot. The same techniques can also help highly dynamic robots operate robustly, and there’s a brand new hardware platform that shows this off: an autonomous bicycle that can jump. See Spot Run This video is showing Spot running at a sustained speed of 5.2 meters per second (11.6 miles per hour). Out of the box, Spot’s top speed is 1.6 meters per second, meaning that RAI’s spot has more than tripled (!) the quadruped’s factory speed. If Spot running this quickly looks a little strange, that’s probably because it is strange, in the sense that the way this robot dog’s legs and body move as it runs is not very much like a real dog at all. “The gait is not biological, but the robot isn’t biological,” explains Farbod Farshidian, roboticist at the RAI Institute. “Spot’s actuators are different from muscles, and its kinematics are different, so a gait that’s suitable for a dog to run fast isn’t necessarily best for this robot.” The best Farshidian can categorize how Spot is moving is that it’s somewhat similar to a trotting gait, except with an added flight phase (with all four feet off the ground at once) that technically turns it into a run. This flight phase is necessary, Farshidian says, because the robot needs that time to successively pull its feet forward fast enough to maintain its speed. This is a “discovered behavior,” in that the robot was not explicitly programmed to “run,” but rather was just required to find the best way of moving as fast as possible. Reinforcement Learning Versus Model Predictive Control The Spot controller that ships with the robot when you buy it from Boston Dynamics is based on model predictive control (MPC), which involves creating a software model that approximates the dynamics of the robot as best you can, and then solving an optimization problem for the tasks that you want the robot to do in real time. It’s a very predictable and reliable method for controlling a robot, but it’s also somewhat rigid, because that original software model won’t be close enough to reality to let you really push the limits of the robot. And if you try to say, “okay, I’m just going to make a super detailed software model of my robot and push the limits that way,” you get stuck because the optimization problem has to be solved for whatever you want the robot to do, in real time, and the more complex the model is, the harder it is to do that quickly enough to be useful. Reinforcement learning (RL), on the other hand, learns offline. You can use as complex of a model as you want, and then take all the time you need in simulation to train a control policy that can then be run very efficiently on the robot. Your browser does not support the video tag. In simulation, a couple of Spots (or hundreds of Spots) can be trained in parallel for robust real-world performance.Robotics and AI Institute In the example of Spot’s top speed, it’s simply not possible to model every last detail for all of the robot’s actuators within a model-based control system that would run in real time on the robot. So instead, simplified (and typically very conservative) assumptions are made about what the actuators are actually doing so that you can expect safe and reliable performance. Farshidian explains that these assumptions make it difficult to develop a useful understanding of what performance limitations actually are. “Many people in robotics know that one of the limitations of running fast is that you’re going to hit the torque and velocity maximum of your actuation system. So, people try to model that using the data sheets of the actuators. For us, the question that we wanted to answer was whether there might exist some other phenomena that was actually limiting performance.” Searching for these other phenomena involved bringing new data into the reinforcement learning pipeline, like detailed actuator models learned from the real world performance of the robot. In Spot’s case, that provided the answer to high-speed running. It turned out that what was limiting Spot’s speed was not the actuators themselves, nor any of the robot’s kinematics: It was simply the batteries not being able to supply enough power. “This was a surprise for me,” Farshidian says, “because I thought we were going to hit the actuator limits first.” Spot’s power system is complex enough that there’s likely some additional wiggle room, and Farshidian says the only thing that prevented them from pushing Spot’s top speed past 5.2 m/s is that they didn’t have access to the battery voltages so they weren’t able to incorporate that real world data into their RL model. “If we had beefier batteries on there, we could have run faster. And if you model that phenomena as well in our simulator, I’m sure that we can push this farther.” Farshidian emphasizes that RAI’s technique is about much more than just getting Spot to run fast—it could also be applied to making Spot move more efficiently to maximize battery life, or more quietly to work better in an office or home environment. Essentially, this is a generalizable tool that can find new ways of expanding the capabilities of any robotic system. And when real world data is used to make a simulated robot better, you can ask the simulation to do more, with confidence that those simulated skills will successfully transfer back onto the real robot. Ultra Mobility Vehicle: Teaching Robot Bikes to Jump Reinforcement learning isn’t just good for maximizing the performance of a robot—it can also make that performance more reliable. The RAI Institute has been experimenting with a completely new kind of robot that they invented in-house: a little jumping bicycle called the Ultra Mobility Vehicle, or UMV, which was trained to do parkour using essentially the same RL pipeline for balancing and driving as was used for Spot’s high speed running. There’s no independent physical stabilization system (like a gyroscope) keeping the UMV from falling over; it’s just a normal bike that can move forwards and backwards and turn its front wheel. As much mass as possible is then packed into the top bit, which actuators can rapidly accelerate up and down. “We’re demonstrating two things in this video,” says Marco Hutter, director of the RAI Institute’s Zurich office. “One is how reinforcement learning helps make the UMV very robust in its driving capabilities in diverse situations. And second, how understanding the robots’ dynamic capabilities allows us to do new things, like jumping on a table which is higher than the robot itself.” “The key of RL in all of this is to discover new behavior and make this robust and reliable under conditions that are very hard to model. That’s where RL really, really shines.” —Marco Hutter, The RAI Institute As impressive as the jumping is, for Hutter, it’s just as difficult (if not more difficult) to do maneuvers that may seem fairly simple, like riding backwards. “Going backwards is highly unstable,” Hutter explains. “At least for us, it was not really possible to do that with a classical [MPC] controller, particularly over rough terrain or with disturbances.” Getting this robot out of the lab and onto terrain to do proper bike parkour is a work in progress that the RAI Institute says they’ll be able to demonstrate in the near future, but it’s really not about what this particular hardware platform can do—it’s about what any robot can do through RL and other learning-based methods, says Hutter. “The bigger picture here is that the hardware of such robotic systems can in theory do a lot more than we were able to achieve with our classic control algorithms. Understanding these hidden limits in hardware systems lets us improve performance and keep pushing the boundaries on control.” Your browser does not support the video tag. Teaching the UMV to drive itself down stairs in sim results in a real robot that can handle stairs at any angle.Robotics and AI Institute Reinforcement Learning for Robots Everywhere Just a few weeks ago, The RAI Institute announced a new partnership with Boston Dynamics ‘to advance humanoid robots through reinforcement learning.’ Humanoids are just another kind of robotic platform, albeit a significantly more complicated one with many more degrees of freedom and things to model and simulate. But when considering the limitations of model predictive control for this level of complexity, a reinforcement learning approach seems almost inevitable, especially when such an approach is already streamlined due to its ability to generalize. “One of the ambitions that we have as an institute is to have solutions which span across all kinds of different platforms,” says Hutter. “It’s about building tools, about building infrastructure, building the basis for this to be done in a broader context. So not only humanoids, but driving vehicles, quadrupeds, you name it. But doing RL research and showcasing some nice first proof of concept is one thing—pushing it to work in the real world under all conditions, while pushing the boundaries in performance, is something else.” Transferring skills into the real world has always been a challenge for robots trained in simulation, precisely because simulation is so friendly to robots. “If you spend enough time,” Farshidian explains, “you can come up with a reward function where eventually the robot will do what you want. What often fails is when you want to transfer that sim behavior to the hardware, because reinforcement learning is very good at finding glitches in your simulator and leveraging them to do the task.” Simulation has been getting much, much better, with new tools, more accurate dynamics, and lots of computing power to throw at the problem. “It’s a hugely powerful ability that we can simulate so many things, and generate so much data almost for free,” Hutter says. But the usefulness of that data is in its connection to reality, making sure that what you’re simulating is accurate enough that a reinforcement learning approach will in fact solve for reality. Bringing physical data collected on real hardware back into the simulation, Hutter believes, is a very promising approach, whether it’s applied to running quadrupeds or jumping bicycles or humanoids. “The combination of the two—of simulation and reality—that’s what I would hypothesize is the right direction.”

an hour ago 1 votes
Reasoning is here to stay

AI Engineering resources 02-21-25

40 minutes ago 1 votes