More from Sam Altman
Our mission is to ensure that AGI (Artificial General Intelligence) benefits all of humanity. Systems that start to point to AGI* are coming into view, and so we think it’s important to understand the moment we are in. AGI is a weakly defined term, but generally speaking we mean it to be a system that can tackle increasingly complex problems, at human level, in many fields. People are tool-builders with an inherent drive to understand and create, which leads to the world getting better for all of us. Each new generation builds upon the discoveries of the generations before to create even more capable tools—electricity, the transistor, the computer, the internet, and soon AGI. Over time, in fits and starts, the steady march of human innovation has brought previously unimaginable levels of prosperity and improvements to almost every aspect of people’s lives. In some sense, AGI is just another tool in this ever-taller scaffolding of human progress we are building together. In another sense, it is the beginning of something for which it’s hard not to say “this time it’s different”; the economic growth in front of us looks astonishing, and we can now imagine a world where we cure all diseases, have much more time to enjoy with our families, and can fully realize our creative potential. In a decade, perhaps everyone on earth will be capable of accomplishing more than the most impactful person can today. We continue to see rapid progress with AI development. Here are three observations about the economics of AI: 1. The intelligence of an AI model roughly equals the log of the resources used to train and run it. These resources are chiefly training compute, data, and inference compute. It appears that you can spend arbitrary amounts of money and get continuous and predictable gains; the scaling laws that predict this are accurate over many orders of magnitude. 2. The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use. You can see this in the token cost from GPT-4 in early 2023 to GPT-4o in mid-2024, where the price per token dropped about 150x in that time period. Moore’s law changed the world at 2x every 18 months; this is unbelievably stronger. 3. The socioeconomic value of linearly increasing intelligence is super-exponential in nature. A consequence of this is that we see no reason for exponentially increasing investment to stop in the near future. If these three observations continue to hold true, the impacts on society will be significant. We are now starting to roll out AI agents, which will eventually feel like virtual co-workers. Let’s imagine the case of a software engineering agent, which is an agent that we expect to be particularly important. Imagine that this agent will eventually be capable of doing most things a software engineer at a top company with a few years of experience could do, for tasks up to a couple of days long. It will not have the biggest new ideas, it will require lots of human supervision and direction, and it will be great at some things but surprisingly bad at others. Still, imagine it as a real-but-relatively-junior virtual coworker. Now imagine 1,000 of them. Or 1 million of them. Now imagine such agents in every field of knowledge work. In some ways, AI may turn out to be like the transistor economically—a big scientific discovery that scales well and that seeps into almost every corner of the economy. We don’t think much about transistors, or transistor companies, and the gains are very widely distributed. But we do expect our computers, TVs, cars, toys, and more to perform miracles. The world will not change all at once; it never does. Life will go on mostly the same in the short run, and people in 2025 will mostly spend their time in the same way they did in 2024. We will still fall in love, create families, get in fights online, hike in nature, etc. But the future will be coming at us in a way that is impossible to ignore, and the long-term changes to our society and economy will be huge. We will find new things to do, new ways to be useful to each other, and new ways to compete, but they may not look very much like the jobs of today. Agency, willfulness, and determination will likely be extremely valuable. Correctly deciding what to do and figuring out how to navigate an ever-changing world will have huge value; resilience and adaptability will be helpful skills to cultivate. AGI will be the biggest lever ever on human willfulness, and enable individual people to have more impact than ever before, not less. We expect the impact of AGI to be uneven. Although some industries will change very little, scientific progress will likely be much faster than it is today; this impact of AGI may surpass everything else. The price of many goods will eventually fall dramatically (right now, the cost of intelligence and the cost of energy constrain a lot of things), and the price of luxury goods and a few inherently limited resources like land may rise even more dramatically. Technically speaking, the road in front of us looks fairly clear. But public policy and collective opinion on how we should integrate AGI into society matter a lot; one of our reasons for launching products early and often is to give society and the technology time to co-evolve. AI will seep into all areas of the economy and society; we will expect everything to be smart. Many of us expect to need to give people more control over the technology than we have historically, including open-sourcing more, and accept that there is a balance between safety and individual empowerment that will require trade-offs. While we never want to be reckless and there will likely be some major decisions and limitations related to AGI safety that will be unpopular, directionally, as we get closer to achieving AGI, we believe that trending more towards individual empowerment is important; the other likely path we can see is AI being used by authoritarian governments to control their population through mass surveillance and loss of autonomy. Ensuring that the benefits of AGI are broadly distributed is critical. The historical impact of technological progress suggests that most of the metrics we care about (health outcomes, economic prosperity, etc.) get better on average and over the long-term, but increasing equality does not seem technologically determined and getting this right may require new ideas. In particular, it does seem like the balance of power between capital and labor could easily get messed up, and this may require early intervention. We are open to strange-sounding ideas like giving some “compute budget” to enable everyone on Earth to use a lot of AI, but we can also see a lot of ways where just relentlessly driving the cost of intelligence as low as possible has the desired effect. Anyone in 2035 should be able to marshall the intellectual capacity equivalent to everyone in 2025; everyone should have access to unlimited genius to direct however they can imagine. There is a great deal of talent right now without the resources to fully express itself, and if we change that, the resulting creative output of the world will lead to tremendous benefits for us all. Thanks especially to Josh Achiam, Boaz Barak and Aleksander Madry for reviewing drafts of this. *By using the term AGI here, we aim to communicate clearly, and we do not intend to alter or interpret the definitions and processes that define our relationship with Microsoft. We fully expect to be partnered with Microsoft for the long term. This footnote seems silly, but on the other hand we know some journalists will try to get clicks by writing something silly so here we are pre-empting the silliness…
The second birthday of ChatGPT was only a little over a month ago, and now we have transitioned into the next paradigm of models that can do complex reasoning. New years get people in a reflective mood, and I wanted to share some personal thoughts about how it has gone so far, and some of the things I’ve learned along the way. As we get closer to AGI, it feels like an important time to look at the progress of our company. There is still so much to understand, still so much we don’t know, and it’s still so early. But we know a lot more than we did when we started. We started OpenAI almost nine years ago because we believed that AGI was possible, and that it could be the most impactful technology in human history. We wanted to figure out how to build it and make it broadly beneficial; we were excited to try to make our mark on history. Our ambitions were extraordinarily high and so was our belief that the work might benefit society in an equally extraordinary way. At the time, very few people cared, and if they did, it was mostly because they thought we had no chance of success. In 2022, OpenAI was a quiet research lab working on something temporarily called “Chat With GPT-3.5”. (We are much better at research than we are at naming things.) We had been watching people use the playground feature of our API and knew that developers were really enjoying talking to the model. We thought building a demo around that experience would show people something important about the future and help us make our models better and safer. We ended up mercifully calling it ChatGPT instead, and launched it on November 30th of 2022. We always knew, abstractly, that at some point we would hit a tipping point and the AI revolution would get kicked off. But we didn’t know what the moment would be. To our surprise, it turned out to be this. The launch of ChatGPT kicked off a growth curve like nothing we have ever seen—in our company, our industry, and the world broadly. We are finally seeing some of the massive upside we have always hoped for from AI, and we can see how much more will come soon. It hasn’t been easy. The road hasn’t been smooth and the right choices haven’t been obvious. In the last two years, we had to build an entire company, almost from scratch, around this new technology. There is no way to train people for this except by doing it, and when the technology category is completely new, there is no one at all who can tell you exactly how it should be done. Building up a company at such high velocity with so little training is a messy process. It’s often two steps forward, one step back (and sometimes, one step forward and two steps back). Mistakes get corrected as you go along, but there aren’t really any handbooks or guideposts when you’re doing original work. Moving at speed in uncharted waters is an incredible experience, but it is also immensely stressful for all the players. Conflicts and misunderstanding abound. These years have been the most rewarding, fun, best, interesting, exhausting, stressful, and—particularly for the last two—unpleasant years of my life so far. The overwhelming feeling is gratitude; I know that someday I’ll be retired at our ranch watching the plants grow, a little bored, and will think back at how cool it was that I got to do the work I dreamed of since I was a little kid. I try to remember that on any given Friday, when seven things go badly wrong by 1 pm. A little over a year ago, on one particular Friday, the main thing that had gone wrong that day was that I got fired by surprise on a video call, and then right after we hung up the board published a blog post about it. I was in a hotel room in Las Vegas. It felt, to a degree that is almost impossible to explain, like a dream gone wrong. Getting fired in public with no warning kicked off a really crazy few hours, and a pretty crazy few days. The “fog of war” was the strangest part. None of us were able to get satisfactory answers about what had happened, or why. The whole event was, in my opinion, a big failure of governance by well-meaning people, myself included. Looking back, I certainly wish I had done things differently, and I’d like to believe I’m a better, more thoughtful leader today than I was a year ago. I also learned the importance of a board with diverse viewpoints and broad experience in managing a complex set of challenges. Good governance requires a lot of trust and credibility. I appreciate the way so many people worked together to build a stronger system of governance for OpenAI that enables us to pursue our mission of ensuring that AGI benefits all of humanity. My biggest takeaway is how much I have to be thankful for and how many people I owe gratitude towards: to everyone who works at OpenAI and has chosen to spend their time and effort going after this dream, to friends who helped us get through the crisis moments, to our partners and customers who supported us and entrusted us to enable their success, and to the people in my life who showed me how much they cared. [1] We all got back to the work in a more cohesive and positive way and I’m very proud of our focus since then. We have done what is easily some of our best research ever. We grew from about 100 million weekly active users to more than 300 million. Most of all, we have continued to put technology out into the world that people genuinely seem to love and that solves real problems. Nine years ago, we really had no idea what we were eventually going to become; even now, we only sort of know. AI development has taken many twists and turns and we expect more in the future. Some of the twists have been joyful; some have been hard. It’s been fun watching a steady stream of research miracles occur, and a lot of naysayers have become true believers. We’ve also seen some colleagues split off and become competitors. Teams tend to turn over as they scale, and OpenAI scales really fast. I think some of this is unavoidable—startups usually see a lot of turnover at each new major level of scale, and at OpenAI numbers go up by orders of magnitude every few months. The last two years have been like a decade at a normal company. When any company grows and evolves so fast, interests naturally diverge. And when any company in an important industry is in the lead, lots of people attack it for all sorts of reasons, especially when they are trying to compete with it. Our vision won’t change; our tactics will continue to evolve. For example, when we started we had no idea we would have to build a product company; we thought we were just going to do great research. We also had no idea we would need such a crazy amount of capital. There are new things we have to go build now that we didn’t understand a few years ago, and there will be new things in the future we can barely imagine now. We are proud of our track-record on research and deployment so far, and are committed to continuing to advance our thinking on safety and benefits sharing. We continue to believe that the best way to make an AI system safe is by iteratively and gradually releasing it into the world, giving society time to adapt and co-evolve with the technology, learning from experience, and continuing to make the technology safer. We believe in the importance of being world leaders on safety and alignment research, and in guiding that research with feedback from real world applications. We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes. We are beginning to turn our aim beyond that, to superintelligence in the true sense of the word. We love our current products, but we are here for the glorious future. With superintelligence, we can do anything else. Superintelligent tools could massively accelerate scientific discovery and innovation well beyond what we are capable of doing on our own, and in turn massively increase abundance and prosperity. This sounds like science fiction right now, and somewhat crazy to even talk about it. That’s alright—we’ve been there before and we’re OK with being there again. We’re pretty confident that in the next few years, everyone will see what we see, and that the need to act with great care, while still maximizing broad benefit and empowerment, is so important. Given the possibilities of our work, OpenAI cannot be a normal company. How lucky and humbling it is to be able to play a role in this work. (Thanks to Josh Tyrangiel for sort of prompting this. I wish we had had a lot more time.) [1] There were a lot of people who did incredible and gigantic amounts of work to help OpenAI, and me personally, during those few days, but two people stood out from all others. Ron Conway and Brian Chesky went so far above and beyond the call of duty that I’m not even sure how to describe it. I’ve of course heard stories about Ron’s ability and tenaciousness for years and I’ve spent a lot of time with Brian over the past couple of years getting a huge amount of help and advice. But there’s nothing quite like being in the foxhole with people to see what they can really do. I am reasonably confident OpenAI would have fallen apart without their help; they worked around the clock for days until things were done. Although they worked unbelievably hard, they stayed calm and had clear strategic thought and great advice throughout. They stopped me from making several mistakes and made none themselves. They used their vast networks for everything needed and were able to navigate many complex situations. And I’m sure they did a lot of things I don’t know about. What I will remember most, though, is their care, compassion, and support. I thought I knew what it looked like to support a founder and a company, and in some small sense I did. But I have never before seen, or even heard of, anything like what these guys did, and now I get more fully why they have the legendary status they do. They are different and both fully deserve their genuinely unique reputations, but they are similar in their remarkable ability to move mountains and help, and in their unwavering commitment in times of need. The tech industry is far better off for having both of them in it. There are others like them; it is an amazingly special thing about our industry and does much more to make it all work than people realize. I look forward to paying it forward. On a more personal note, thanks especially to Ollie for his support that weekend and always; he is incredible in every way and no one could ask for a better partner.
Optimism, obsession, self-belief, raw horsepower and personal connections are how things get started. Cohesive teams, the right combination of calmness and urgency, and unreasonable commitment are how things get finished. Long-term orientation is in short supply; try not to worry about what people think in the short term, which will get easier over time. It is easier for a team to do a hard thing that really matters than to do an easy thing that doesn’t really matter; audacious ideas motivate people. Incentives are superpowers; set them carefully. Concentrate your resources on a small number of high-conviction bets; this is easy to say but evidently hard to do. You can delete more stuff than you think. Communicate clearly and concisely. Fight bullshit and bureaucracy every time you see it and get other people to fight it too. Do not let the org chart get in the way of people working productively together. Outcomes are what count; don’t let good process excuse bad results. Spend more time recruiting. Take risks on high-potential people with a fast rate of improvement. Look for evidence of getting stuff done in addition to intelligence. Superstars are even more valuable than they seem, but you have to evaluate people on their net impact on the performance of the organization. Fast iteration can make up for a lot; it’s usually ok to be wrong if you iterate quickly. Plans should be measured in decades, execution should be measured in weeks. Don’t fight the business equivalent of the laws of physics. Inspiration is perishable and life goes by fast. Inaction is a particularly insidious type of risk. Scale often has surprising emergent properties. Compounding exponentials are magic. In particular, you really want to build a business that gets a compounding advantage with scale. Get back up and keep going. Working with great people is one of the best parts of life.
Helion has been progressing even faster than I expected and is on pace in 2024 to 1) demonstrate Q > 1 fusion and 2) resolve all questions needed to design a mass-producible fusion generator. The goals of the company are quite ambitious—clean, continuous energy for 1 cent per kilowatt-hour, and the ability to manufacture enough power plants to satisfy the current electrical demand of earth in a ten year period. If both things happen, it will transform the world. Abundant, clean, and radically inexpensive energy will elevate the quality of life for all of us—think about how much the cost of energy factors into what we do and use. Also, electricity at this price will allow us to do things like efficiently capture carbon (so although we’ll still rely on gasoline for awhile, it’ll be ok). Although Helion’s scientific progress of the past 8 years is phenomenal and necessary, it is not sufficient to rapidly get to this new energy economy. Helion now needs to figure out how to engineer machines that don’t break, how to build a factory and supply chain capable of manufacturing a machine every day, how to work with power grids and governments around the world, and more. The biggest input to the degree and speed of success at the company is now the talent of the people who join the team. Here are a few of the most critical jobs, but please don’t let the lack of a perfect fit deter you from applying. Electrical Engineer, Low Voltage: https://boards.greenhouse.io/helionenergy/jobs/4044506005 Electrical Engineer, Pulsed Power: https://boards.greenhouse.io/helionenergy/jobs/4044510005 Mechanical Engineer, Generator Systems: https://boards.greenhouse.io/helionenergy/jobs/4044522005 Manager of Mechanical Engineering: https://boards.greenhouse.io/helionenergy/jobs/4044521005 (All current jobs: https://www.helionenergy.com/careers/)
More in AI
Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion. RoboCup German Open: 12–16 March 2025, NUREMBERG, GERMANY German Robotics Conference: 13–15 March 2025, NUREMBERG, GERMANY European Robotics Forum: 25–27 March 2025, STUTTGART, GERMANY RoboSoft 2025: 23–26 April 2025, LAUSANNE, SWITZERLAND ICUAS 2025: 14–17 May 2025, CHARLOTTE, NC ICRA 2025: 19–23 May 2025, ATLANTA, GA London Humanoids Summit: 29–30 May 2025, LONDON IEEE RCAR 2025: 1–6 June 2025, TOYAMA, JAPAN 2025 Energy Drone & Robotics Summit: 16–18 June 2025, HOUSTON, TX RSS 2025: 21–25 June 2025, LOS ANGELES ETH Robotics Summer School: 21–27 June 2025, GENEVA IAS 2025: 30 June–4 July 2025, GENOA, ITALY ICRES 2025: 3–4 July 2025, PORTO, PORTUGAL IEEE World Haptics: 8–11 July 2025, SUWON, KOREA IFAC Symposium on Robotics: 15–18 July 2025, PARIS RoboCup 2025: 15–21 July 2025, BAHIA, BRAZIL Enjoy today’s videos! We’re introducing Helix, a generalist Vision-Language-Action (VLA) model that unifies perception, language understanding, and learned control to overcome multiple longstanding challenges in robotics. This is moderately impressive; my favorite part is probably the hand-offs and that extra little bit of HRI with what we’d call eye contact if these robots had faces. But keep in mind that you’re looking at close to best case for robotic manipulation, and that if the robots had been given the bag instead of well-spaced objects on a single color background, or if the fridge had a normal human amount of stuff in it, they might be having a much different time of it. Also, is it just me, or is the sound on this video very weird? Like, some things make noise, some things don’t, and the robots themselves occasionally sound more like someone just added in some ‘soft actuator sound’ or something. Also also, I’m of a suspicious nature, and when there is an abrupt cut between ‘robot grasps door’ and ‘robot opens door,’ I assume the worst. [ Figure ] Researchers at EPFL have developed a highly agile flat swimming robot. This robot is smaller than a credit card, and propels on the water surface using a pair of undulating soft fins. The fins are driven at resonance by artificial muscles, allowing the robot to perform complex maneuvers. In the future, this robot can be used for monitoring water quality or help with measuring fertilizer concentrations in rice fields [ Paper ] via [ Science Robotics ] I don’t know about you, but I always dance better when getting beaten with a stick. [ Unitree Robotics ] This is big news, people: Sweet Bite Ham Ham, one of the greatest and most useless robots of all time, has a new treat. All yours for about $100, overseas shipping included. [ Ham Ham ] via [ Robotstart ] MagicLab has announced the launch of its first generation self-developed dexterous hand product, the MagicHand S01. The MagicHand S01 has 11 degrees of freedom in a single hand. The MagicHand S01 has a hand load capacity of up to 5 kilograms, and in work environments, can carry loads of over 20 kilograms. [ MagicLab ] Thanks, Ni Tao! No, I’m not creeped out at all, why? [ Clone Robotics ] Happy 40th Birthday to the MIT Media Lab! Since 1985, the MIT Media Lab has provided a home for interdisciplinary research, transformative technologies, and innovative approaches to solving some of humanity’s greatest challenges. As we celebrate our 40th anniversary year, we’re looking ahead to decades more of imagining, designing, and inventing a future in which everyone has the opportunity to flourish. [ MIT Media Lab ] While most soft pneumatic grippers that operate with a single control parameter (such as pressure or airflow) are limited to a single grasping modality, this article introduces a new method for incorporating multiple grasping modalities into vacuum-driven soft grippers. This is achieved by combining stiffness manipulation with a bistable mechanism. Adjusting the airflow tunes the energy barrier of the bistable mechanism, enabling changes in triggering sensitivity and allowing swift transitions between grasping modes. This results in an exceptional versatile gripper, capable of handling a diverse range of objects with varying sizes, shapes, stiffness, and roughness, controlled by a single parameter, airflow, and its interaction with objects. [ Paper ] via [ BruBotics ] Thanks, Bram! In this article, we present a design concept, in which a monolithic soft body is incorporated with a vibration-driven mechanism, called Leafbot. This proposed investigation aims to build a foundation for further terradynamics study of vibration-driven soft robots in a more complicated and confined environment, with potential applications in inspection tasks. [ Paper ] via [ IEEE Transactions on Robots ] We present a hybrid aerial-ground robot that combines the versatility of a quadcopter with enhanced terrestrial mobility. The vehicle features a passive, reconfigurable single wheeled leg, enabling seamless transitions between flight and two ground modes: a stable stance and a dynamic cruising configuration. [ Robotics and Intelligent Systems Laboratory ] I’m not sure I’ve ever seen this trick performed by a robot with soft fingers before. [ Paper ] There are a lot of robots involved in car manufacturing. Like, a lot. [ Kawasaki Robotics ] Steve Willits shows us some recent autonomous drone work being done at the AirLab at CMU’s Robotics Institute. [ Carnegie Mellon University Robotics Institute ] Somebody’s got to test all those luxury handbags and purses. And by somebody, I mean somerobot. [ Qb Robotics ] Do not trust people named Evan. [ Tufts University Human-Robot Interaction Lab ] Meet the Mind: MIT Professor Andreea Bobu. [ MIT ]
In a previous post I made the point that having a weak manager - a manager without political clout - is really bad news if you’re an…
About a year ago, Boston Dynamics released a research version of its Spot quadruped robot, which comes with a low-level application programming interface (API) that allows direct control of Spot’s joints. Even back then, the rumor was that this API unlocked some significant performance improvements on Spot, including a much faster running speed. That rumor came from the Robotics and AI (RAI) Institute, formerly The AI Institute, formerly the Boston Dynamics AI Institute, and if you were at Marc Raibert’s talk at the ICRA@40 conference in Rotterdam last fall, you already know that it turned out not to be a rumor at all. Today, we’re able to share some of the work that the RAI Institute has been doing to apply reality-grounded reinforcement learning techniques to enable much higher performance from Spot. The same techniques can also help highly dynamic robots operate robustly, and there’s a brand new hardware platform that shows this off: an autonomous bicycle that can jump. See Spot Run This video is showing Spot running at a sustained speed of 5.2 meters per second (11.6 miles per hour). Out of the box, Spot’s top speed is 1.6 meters per second, meaning that RAI’s spot has more than tripled (!) the quadruped’s factory speed. If Spot running this quickly looks a little strange, that’s probably because it is strange, in the sense that the way this robot dog’s legs and body move as it runs is not very much like a real dog at all. “The gait is not biological, but the robot isn’t biological,” explains Farbod Farshidian, roboticist at the RAI Institute. “Spot’s actuators are different from muscles, and its kinematics are different, so a gait that’s suitable for a dog to run fast isn’t necessarily best for this robot.” The best Farshidian can categorize how Spot is moving is that it’s somewhat similar to a trotting gait, except with an added flight phase (with all four feet off the ground at once) that technically turns it into a run. This flight phase is necessary, Farshidian says, because the robot needs that time to successively pull its feet forward fast enough to maintain its speed. This is a “discovered behavior,” in that the robot was not explicitly programmed to “run,” but rather was just required to find the best way of moving as fast as possible. Reinforcement Learning Versus Model Predictive Control The Spot controller that ships with the robot when you buy it from Boston Dynamics is based on model predictive control (MPC), which involves creating a software model that approximates the dynamics of the robot as best you can, and then solving an optimization problem for the tasks that you want the robot to do in real time. It’s a very predictable and reliable method for controlling a robot, but it’s also somewhat rigid, because that original software model won’t be close enough to reality to let you really push the limits of the robot. And if you try to say, “okay, I’m just going to make a super detailed software model of my robot and push the limits that way,” you get stuck because the optimization problem has to be solved for whatever you want the robot to do, in real time, and the more complex the model is, the harder it is to do that quickly enough to be useful. Reinforcement learning (RL), on the other hand, learns offline. You can use as complex of a model as you want, and then take all the time you need in simulation to train a control policy that can then be run very efficiently on the robot. Your browser does not support the video tag. In simulation, a couple of Spots (or hundreds of Spots) can be trained in parallel for robust real-world performance.Robotics and AI Institute In the example of Spot’s top speed, it’s simply not possible to model every last detail for all of the robot’s actuators within a model-based control system that would run in real time on the robot. So instead, simplified (and typically very conservative) assumptions are made about what the actuators are actually doing so that you can expect safe and reliable performance. Farshidian explains that these assumptions make it difficult to develop a useful understanding of what performance limitations actually are. “Many people in robotics know that one of the limitations of running fast is that you’re going to hit the torque and velocity maximum of your actuation system. So, people try to model that using the data sheets of the actuators. For us, the question that we wanted to answer was whether there might exist some other phenomena that was actually limiting performance.” Searching for these other phenomena involved bringing new data into the reinforcement learning pipeline, like detailed actuator models learned from the real world performance of the robot. In Spot’s case, that provided the answer to high-speed running. It turned out that what was limiting Spot’s speed was not the actuators themselves, nor any of the robot’s kinematics: It was simply the batteries not being able to supply enough power. “This was a surprise for me,” Farshidian says, “because I thought we were going to hit the actuator limits first.” Spot’s power system is complex enough that there’s likely some additional wiggle room, and Farshidian says the only thing that prevented them from pushing Spot’s top speed past 5.2 m/s is that they didn’t have access to the battery voltages so they weren’t able to incorporate that real world data into their RL model. “If we had beefier batteries on there, we could have run faster. And if you model that phenomena as well in our simulator, I’m sure that we can push this farther.” Farshidian emphasizes that RAI’s technique is about much more than just getting Spot to run fast—it could also be applied to making Spot move more efficiently to maximize battery life, or more quietly to work better in an office or home environment. Essentially, this is a generalizable tool that can find new ways of expanding the capabilities of any robotic system. And when real world data is used to make a simulated robot better, you can ask the simulation to do more, with confidence that those simulated skills will successfully transfer back onto the real robot. Ultra Mobility Vehicle: Teaching Robot Bikes to Jump Reinforcement learning isn’t just good for maximizing the performance of a robot—it can also make that performance more reliable. The RAI Institute has been experimenting with a completely new kind of robot that they invented in-house: a little jumping bicycle called the Ultra Mobility Vehicle, or UMV, which was trained to do parkour using essentially the same RL pipeline for balancing and driving as was used for Spot’s high speed running. There’s no independent physical stabilization system (like a gyroscope) keeping the UMV from falling over; it’s just a normal bike that can move forwards and backwards and turn its front wheel. As much mass as possible is then packed into the top bit, which actuators can rapidly accelerate up and down. “We’re demonstrating two things in this video,” says Marco Hutter, director of the RAI Institute’s Zurich office. “One is how reinforcement learning helps make the UMV very robust in its driving capabilities in diverse situations. And second, how understanding the robots’ dynamic capabilities allows us to do new things, like jumping on a table which is higher than the robot itself.” “The key of RL in all of this is to discover new behavior and make this robust and reliable under conditions that are very hard to model. That’s where RL really, really shines.” —Marco Hutter, The RAI Institute As impressive as the jumping is, for Hutter, it’s just as difficult (if not more difficult) to do maneuvers that may seem fairly simple, like riding backwards. “Going backwards is highly unstable,” Hutter explains. “At least for us, it was not really possible to do that with a classical [MPC] controller, particularly over rough terrain or with disturbances.” Getting this robot out of the lab and onto terrain to do proper bike parkour is a work in progress that the RAI Institute says they’ll be able to demonstrate in the near future, but it’s really not about what this particular hardware platform can do—it’s about what any robot can do through RL and other learning-based methods, says Hutter. “The bigger picture here is that the hardware of such robotic systems can in theory do a lot more than we were able to achieve with our classic control algorithms. Understanding these hidden limits in hardware systems lets us improve performance and keep pushing the boundaries on control.” Your browser does not support the video tag. Teaching the UMV to drive itself down stairs in sim results in a real robot that can handle stairs at any angle.Robotics and AI Institute Reinforcement Learning for Robots Everywhere Just a few weeks ago, The RAI Institute announced a new partnership with Boston Dynamics ‘to advance humanoid robots through reinforcement learning.’ Humanoids are just another kind of robotic platform, albeit a significantly more complicated one with many more degrees of freedom and things to model and simulate. But when considering the limitations of model predictive control for this level of complexity, a reinforcement learning approach seems almost inevitable, especially when such an approach is already streamlined due to its ability to generalize. “One of the ambitions that we have as an institute is to have solutions which span across all kinds of different platforms,” says Hutter. “It’s about building tools, about building infrastructure, building the basis for this to be done in a broader context. So not only humanoids, but driving vehicles, quadrupeds, you name it. But doing RL research and showcasing some nice first proof of concept is one thing—pushing it to work in the real world under all conditions, while pushing the boundaries in performance, is something else.” Transferring skills into the real world has always been a challenge for robots trained in simulation, precisely because simulation is so friendly to robots. “If you spend enough time,” Farshidian explains, “you can come up with a reward function where eventually the robot will do what you want. What often fails is when you want to transfer that sim behavior to the hardware, because reinforcement learning is very good at finding glitches in your simulator and leveraging them to do the task.” Simulation has been getting much, much better, with new tools, more accurate dynamics, and lots of computing power to throw at the problem. “It’s a hugely powerful ability that we can simulate so many things, and generate so much data almost for free,” Hutter says. But the usefulness of that data is in its connection to reality, making sure that what you’re simulating is accurate enough that a reinforcement learning approach will in fact solve for reality. Bringing physical data collected on real hardware back into the simulation, Hutter believes, is a very promising approach, whether it’s applied to running quadrupeds or jumping bicycles or humanoids. “The combination of the two—of simulation and reality—that’s what I would hypothesize is the right direction.”