Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
A lot of people ask me what the ideal cofounder looks like.  I now have an answer: Greg Brockman. Every successful startup I know has at least one person who provides the force of will to make the startup happen.  I’d thought a lot about this in the abstract while advising YC startups, but until OpenAI I hadn’t observed up close someone else drive the formation of a startup. OpenAI wouldn’t have happened without Greg.  He commits quickly and fully to things.  I organized a group dinner early on to talk about what such an organization might look like, and drove him home afterwards.  Greg asked me questions for the first half of the drive back to San Francisco, then declared he was in, and started planning logistics for the rest of the drive. From then on he was fully in, with an average email response time of about 5 minutes to anything.  Elon and I were both busy with day jobs, but Greg kept everything moving forward with imperfect information and a very high-latency connection. He...
over a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Sam Altman

Three Observations

Our mission is to ensure that AGI (Artificial General Intelligence) benefits all of humanity.  Systems that start to point to AGI* are coming into view, and so we think it’s important to understand the moment we are in. AGI is a weakly defined term, but generally speaking we mean it to be a system that can tackle increasingly complex problems, at human level, in many fields. People are tool-builders with an inherent drive to understand and create, which leads to the world getting better for all of us. Each new generation builds upon the discoveries of the generations before to create even more capable tools—electricity, the transistor, the computer, the internet, and soon AGI. Over time, in fits and starts, the steady march of human innovation has brought previously unimaginable levels of prosperity and improvements to almost every aspect of people’s lives. In some sense, AGI is just another tool in this ever-taller scaffolding of human progress we are building together. In another sense, it is the beginning of something for which it’s hard not to say “this time it’s different”; the economic growth in front of us looks astonishing, and we can now imagine a world where we cure all diseases, have much more time to enjoy with our families, and can fully realize our creative potential. In a decade, perhaps everyone on earth will be capable of accomplishing more than the most impactful person can today. We continue to see rapid progress with AI development. Here are three observations about the economics of AI: 1. The intelligence of an AI model roughly equals the log of the resources used to train and run it. These resources are chiefly training compute, data, and inference compute. It appears that you can spend arbitrary amounts of money and get continuous and predictable gains; the scaling laws that predict this are accurate over many orders of magnitude. 2. The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use. You can see this in the token cost from GPT-4 in early 2023 to GPT-4o in mid-2024, where the price per token dropped about 150x in that time period. Moore’s law changed the world at 2x every 18 months; this is unbelievably stronger.  3. The socioeconomic value of linearly increasing intelligence is super-exponential in nature. A consequence of this is that we see no reason for exponentially increasing investment to stop in the near future. If these three observations continue to hold true, the impacts on society will be significant. We are now starting to roll out AI agents, which will eventually feel like virtual co-workers. Let’s imagine the case of a software engineering agent, which is an agent that we expect to be particularly important. Imagine that this agent will eventually be capable of doing most things a software engineer at a top company with a few years of experience could do, for tasks up to a couple of days long. It will not have the biggest new ideas, it will require lots of human supervision and direction, and it will be great at some things but surprisingly bad at others. Still, imagine it as a real-but-relatively-junior virtual coworker. Now imagine 1,000 of them. Or 1 million of them. Now imagine such agents in every field of knowledge work. In some ways, AI may turn out to be like the transistor economically—a big scientific discovery that scales well and that seeps into almost every corner of the economy. We don’t think much about transistors, or transistor companies, and the gains are very widely distributed. But we do expect our computers, TVs, cars, toys, and more to perform miracles. The world will not change all at once; it never does. Life will go on mostly the same in the short run, and people in 2025 will mostly spend their time in the same way they did in 2024. We will still fall in love, create families, get in fights online, hike in nature, etc. But the future will be coming at us in a way that is impossible to ignore, and the long-term changes to our society and economy will be huge. We will find new things to do, new ways to be useful to each other, and new ways to compete, but they may not look very much like the jobs of today.  Agency, willfulness, and determination will likely be extremely valuable. Correctly deciding what to do and figuring out how to navigate an ever-changing world will have huge value; resilience and adaptability will be helpful skills to cultivate. AGI will be the biggest lever ever on human willfulness, and enable individual people to have more impact than ever before, not less. We expect the impact of AGI to be uneven. Although some industries will change very little, scientific progress will likely be much faster than it is today; this impact of AGI may surpass everything else. The price of many goods will eventually fall dramatically (right now, the cost of intelligence and the cost of energy constrain a lot of things), and the price of luxury goods and a few inherently limited resources like land may rise even more dramatically. Technically speaking, the road in front of us looks fairly clear. But public policy and collective opinion on how we should integrate AGI into society matter a lot; one of our reasons for launching products early and often is to give society and the technology time to co-evolve. AI will seep into all areas of the economy and society; we will expect everything to be smart. Many of us expect to need to give people more control over the technology than we have historically, including open-sourcing more, and accept that there is a balance between safety and individual empowerment that will require trade-offs. While we never want to be reckless and there will likely be some major decisions and limitations related to AGI safety that will be unpopular, directionally, as we get closer to achieving AGI, we believe that trending more towards individual empowerment is important; the other likely path we can see is AI being used by authoritarian governments to control their population through mass surveillance and loss of autonomy. Ensuring that the benefits of AGI are broadly distributed is critical. The historical impact of technological progress suggests that most of the metrics we care about (health outcomes, economic prosperity, etc.) get better on average and over the long-term, but increasing equality does not seem technologically determined and getting this right may require new ideas. In particular, it does seem like the balance of power between capital and labor could easily get messed up, and this may require early intervention. We are open to strange-sounding ideas like giving some “compute budget” to enable everyone on Earth to use a lot of AI, but we can also see a lot of ways where just relentlessly driving the cost of intelligence as low as possible has the desired effect. Anyone in 2035 should be able to marshall the intellectual capacity equivalent to everyone in 2025; everyone should have access to unlimited genius to direct however they can imagine. There is a great deal of talent right now without the resources to fully express itself, and if we change that, the resulting creative output of the world will lead to tremendous benefits for us all. Thanks especially to Josh Achiam, Boaz Barak and Aleksander Madry for reviewing drafts of this. *By using the term AGI here, we aim to communicate clearly, and we do not intend to alter or interpret the definitions and processes that define our relationship with Microsoft. We fully expect to be partnered with Microsoft for the long term. This footnote seems silly, but on the other hand we know some journalists will try to get clicks by writing something silly so here we are pre-empting the silliness…

a month ago 16 votes

The second birthday of ChatGPT was only a little over a month ago, and now we have transitioned into the next paradigm of models that can do complex reasoning. New years get people in a reflective mood, and I wanted to share some personal thoughts about how it has gone so far, and some of the things I’ve learned along the way. As we get closer to AGI, it feels like an important time to look at the progress of our company. There is still so much to understand, still so much we don’t know, and it’s still so early. But we know a lot more than we did when we started. We started OpenAI almost nine years ago because we believed that AGI was possible, and that it could be the most impactful technology in human history. We wanted to figure out how to build it and make it broadly beneficial; we were excited to try to make our mark on history. Our ambitions were extraordinarily high and so was our belief that the work might benefit society in an equally extraordinary way. At the time, very few people cared, and if they did, it was mostly because they thought we had no chance of success. In 2022, OpenAI was a quiet research lab working on something temporarily called “Chat With GPT-3.5”. (We are much better at research than we are at naming things.) We had been watching people use the playground feature of our API and knew that developers were really enjoying talking to the model. We thought building a demo around that experience would show people something important about the future and help us make our models better and safer. We ended up mercifully calling it ChatGPT instead, and launched it on November 30th of 2022. We always knew, abstractly, that at some point we would hit a tipping point and the AI revolution would get kicked off. But we didn’t know what the moment would be. To our surprise, it turned out to be this. The launch of ChatGPT kicked off a growth curve like nothing we have ever seen—in our company, our industry, and the world broadly. We are finally seeing some of the massive upside we have always hoped for from AI, and we can see how much more will come soon. It hasn’t been easy. The road hasn’t been smooth and the right choices haven’t been obvious. In the last two years, we had to build an entire company, almost from scratch, around this new technology. There is no way to train people for this except by doing it, and when the technology category is completely new, there is no one at all who can tell you exactly how it should be done. Building up a company at such high velocity with so little training is a messy process. It’s often two steps forward, one step back (and sometimes, one step forward and two steps back). Mistakes get corrected as you go along, but there aren’t really any handbooks or guideposts when you’re doing original work. Moving at speed in uncharted waters is an incredible experience, but it is also immensely stressful for all the players. Conflicts and misunderstanding abound. These years have been the most rewarding, fun, best, interesting, exhausting, stressful, and—particularly for the last two—unpleasant years of my life so far. The overwhelming feeling is gratitude; I know that someday I’ll be retired at our ranch watching the plants grow, a little bored, and will think back at how cool it was that I got to do the work I dreamed of since I was a little kid. I try to remember that on any given Friday, when seven things go badly wrong by 1 pm. A little over a year ago, on one particular Friday, the main thing that had gone wrong that day was that I got fired by surprise on a video call, and then right after we hung up the board published a blog post about it. I was in a hotel room in Las Vegas. It felt, to a degree that is almost impossible to explain, like a dream gone wrong. Getting fired in public with no warning kicked off a really crazy few hours, and a pretty crazy few days. The “fog of war” was the strangest part. None of us were able to get satisfactory answers about what had happened, or why.  The whole event was, in my opinion, a big failure of governance by well-meaning people, myself included. Looking back, I certainly wish I had done things differently, and I’d like to believe I’m a better, more thoughtful leader today than I was a year ago. I also learned the importance of a board with diverse viewpoints and broad experience in managing a complex set of challenges. Good governance requires a lot of trust and credibility. I appreciate the way so many people worked together to build a stronger system of governance for OpenAI that enables us to pursue our mission of ensuring that AGI benefits all of humanity. My biggest takeaway is how much I have to be thankful for and how many people I owe gratitude towards: to everyone who works at OpenAI and has chosen to spend their time and effort going after this dream, to friends who helped us get through the crisis moments, to our partners and customers who supported us and entrusted us to enable their success, and to the people in my life who showed me how much they cared. [1] We all got back to the work in a more cohesive and positive way and I’m very proud of our focus since then. We have done what is easily some of our best research ever. We grew from about 100 million weekly active users to more than 300 million. Most of all, we have continued to put technology out into the world that people genuinely seem to love and that solves real problems. Nine years ago, we really had no idea what we were eventually going to become; even now, we only sort of know. AI development has taken many twists and turns and we expect more in the future. Some of the twists have been joyful; some have been hard. It’s been fun watching a steady stream of research miracles occur, and a lot of naysayers have become true believers. We’ve also seen some colleagues split off and become competitors. Teams tend to turn over as they scale, and OpenAI scales really fast. I think some of this is unavoidable—startups usually see a lot of turnover at each new major level of scale, and at OpenAI numbers go up by orders of magnitude every few months. The last two years have been like a decade at a normal company. When any company grows and evolves so fast, interests naturally diverge. And when any company in an important industry is in the lead, lots of people attack it for all sorts of reasons, especially when they are trying to compete with it. Our vision won’t change; our tactics will continue to evolve. For example, when we started we had no idea we would have to build a product company; we thought we were just going to do great research. We also had no idea we would need such a crazy amount of capital. There are new things we have to go build now that we didn’t understand a few years ago, and there will be new things in the future we can barely imagine now.  We are proud of our track-record on research and deployment so far, and are committed to continuing to advance our thinking on safety and benefits sharing. We continue to believe that the best way to make an AI system safe is by iteratively and gradually releasing it into the world, giving society time to adapt and co-evolve with the technology, learning from experience, and continuing to make the technology safer. We believe in the importance of being world leaders on safety and alignment research, and in guiding that research with feedback from real world applications. We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes. We are beginning to turn our aim beyond that, to superintelligence in the true sense of the word. We love our current products, but we are here for the glorious future. With superintelligence, we can do anything else. Superintelligent tools could massively accelerate scientific discovery and innovation well beyond what we are capable of doing on our own, and in turn massively increase abundance and prosperity. This sounds like science fiction right now, and somewhat crazy to even talk about it. That’s alright—we’ve been there before and we’re OK with being there again. We’re pretty confident that in the next few years, everyone will see what we see, and that the need to act with great care, while still maximizing broad benefit and empowerment, is so important. Given the possibilities of our work, OpenAI cannot be a normal company. How lucky and humbling it is to be able to play a role in this work. (Thanks to Josh Tyrangiel for sort of prompting this. I wish we had had a lot more time.) [1] There were a lot of people who did incredible and gigantic amounts of work to help OpenAI, and me personally, during those few days, but two people stood out from all others. Ron Conway and Brian Chesky went so far above and beyond the call of duty that I’m not even sure how to describe it. I’ve of course heard stories about Ron’s ability and tenaciousness for years and I’ve spent a lot of time with Brian over the past couple of years getting a huge amount of help and advice. But there’s nothing quite like being in the foxhole with people to see what they can really do. I am reasonably confident OpenAI would have fallen apart without their help; they worked around the clock for days until things were done. Although they worked unbelievably hard, they stayed calm and had clear strategic thought and great advice throughout. They stopped me from making several mistakes and made none themselves. They used their vast networks for everything needed and were able to navigate many complex situations. And I’m sure they did a lot of things I don’t know about. What I will remember most, though, is their care, compassion, and support. I thought I knew what it looked like to support a founder and a company, and in some small sense I did. But I have never before seen, or even heard of, anything like what these guys did, and now I get more fully why they have the legendary status they do. They are different and both fully deserve their genuinely unique reputations, but they are similar in their remarkable ability to move mountains and help, and in their unwavering commitment in times of need. The tech industry is far better off for having both of them in it. There are others like them; it is an amazingly special thing about our industry and does much more to make it all work than people realize. I look forward to paying it forward. On a more personal note, thanks especially to Ollie for his support that weekend and always; he is incredible in every way and no one could ask for a better partner.

2 months ago 62 votes

There are two things from our announcement today I wanted to highlight. First, a key part of our mission is to put very capable AI tools in the hands of people for free (or at a great price). I am very proud that we’ve made the best model in the world available for free in ChatGPT, without ads or anything like that.  Our initial conception when we started OpenAI was that we’d create AI and use it to create all sorts of benefits for the world. Instead, it now looks like we’ll create AI and then other people will use it to create all sorts of amazing things that we all benefit from.  We are a business and will find plenty of things to charge for, and that will help us provide free, outstanding AI service to (hopefully) billions of people.  Second, the new voice (and video) mode is the best computer interface I’ve ever used. It feels like AI from the movies; and it’s still a bit surprising to me that it’s real. Getting to human-level response times and expressiveness turns out to be a big change. The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful. Talking to a computer has never felt really natural for me; now it does. As we add (optional) personalization, access to your information, the ability to take actions on your behalf, and more, I can really see an exciting future where we are able to use computers to do much more than ever before. Finally, huge thanks to the team that poured so much work into making this happen!

10 months ago 145 votes
What I Wish Someone Had Told Me

Optimism, obsession, self-belief, raw horsepower and personal connections are how things get started. Cohesive teams, the right combination of calmness and urgency, and unreasonable commitment are how things get finished. Long-term orientation is in short supply; try not to worry about what people think in the short term, which will get easier over time. It is easier for a team to do a hard thing that really matters than to do an easy thing that doesn’t really matter; audacious ideas motivate people. Incentives are superpowers; set them carefully. Concentrate your resources on a small number of high-conviction bets; this is easy to say but evidently hard to do. You can delete more stuff than you think. Communicate clearly and concisely. Fight bullshit and bureaucracy every time you see it and get other people to fight it too. Do not let the org chart get in the way of people working productively together. Outcomes are what count; don’t let good process excuse bad results. Spend more time recruiting. Take risks on high-potential people with a fast rate of improvement. Look for evidence of getting stuff done in addition to intelligence. Superstars are even more valuable than they seem, but you have to evaluate people on their net impact on the performance of the organization. Fast iteration can make up for a lot; it’s usually ok to be wrong if you iterate quickly. Plans should be measured in decades, execution should be measured in weeks. Don’t fight the business equivalent of the laws of physics. Inspiration is perishable and life goes by fast. Inaction is a particularly insidious type of risk. Scale often has surprising emergent properties. Compounding exponentials are magic. In particular, you really want to build a business that gets a compounding advantage with scale. Get back up and keep going. Working with great people is one of the best parts of life.

a year ago 98 votes
Helion Needs You

Helion has been progressing even faster than I expected and is on pace in 2024 to 1) demonstrate Q > 1 fusion and 2) resolve all questions needed to design a mass-producible fusion generator. The goals of the company are quite ambitious—clean, continuous energy for 1 cent per kilowatt-hour, and the ability to manufacture enough power plants to satisfy the current electrical demand of earth in a ten year period. If both things happen, it will transform the world. Abundant, clean, and radically inexpensive energy will elevate the quality of life for all of us—think about how much the cost of energy factors into what we do and use. Also, electricity at this price will allow us to do things like efficiently capture carbon (so although we’ll still rely on gasoline for awhile, it’ll be ok). Although Helion’s scientific progress of the past 8 years is phenomenal and necessary, it is not sufficient to rapidly get to this new energy economy. Helion now needs to figure out how to engineer machines that don’t break, how to build a factory and supply chain capable of manufacturing a machine every day, how to work with power grids and governments around the world, and more. The biggest input to the degree and speed of success at the company is now the talent of the people who join the team. Here are a few of the most critical jobs, but please don’t let the lack of a perfect fit deter you from applying. Electrical Engineer, Low Voltage: https://boards.greenhouse.io/helionenergy/jobs/4044506005 Electrical Engineer, Pulsed Power: https://boards.greenhouse.io/helionenergy/jobs/4044510005 Mechanical Engineer, Generator Systems: https://boards.greenhouse.io/helionenergy/jobs/4044522005 Manager of Mechanical Engineering: https://boards.greenhouse.io/helionenergy/jobs/4044521005 (All current jobs: https://www.helionenergy.com/careers/)

over a year ago 29 votes

More in AI

Is GitHub Lying Here?

My partners and I keep getting this spam-like email. I figured it was just a forgery. However, I went on my own to our organization’s GitHub administration page and a similar message lives there. We run a small group, so I am pretty sure nobody has in fact asked for […]

10 hours ago 1 votes
Kyiv Start-Up Tests Unified Controller for Robots and Drones

Ukraine’s young tech entrepreneurs think that a combination of robots and lessons from war-gaming could turn the tide in the war against Russia. They are developing an intelligent operating system to enable a single controller to remotely operate swarms of interconnected drones and cannon-equipped land robots. The tech, they say, could help Ukraine cope with Russia’s numerical advantage. Kyiv-based start-up Ark Robotics is conducting trials on an embryo of such a system in cooperation with one of the brigades of Ukraine’s ground forces. The company emerged about a year ago, when a group of young roboticists heard a speech by one of the Ukrainian commanders detailing challenges on the frontline. “At that time, we were building unmanned ground vehicles [UGVs],” Andryi Udovychenko, Ark Robotics’s operations lead, told IEEE Spectrum on the sidelines of the Brave 1 Defense Tech Innovations Forum held in Kyiv last month. “But we heard that what we had [to offer] wasn’t enough. They said they needed something more.” Since the war began, a vibrant defense tech innovation ecosystem has emerged in Ukraine, having started from modest beginnings of modifying China-made DJI MAVIC drones to make up for the lack of artillery. Today, Ukraine is a drone-making powerhouse. Dozens of startup companies are churning out newer and better tech and rapidly refining it to improve the effectiveness of the beleaguered nation’s troops. First-person-view drones have become a symbol of this war, but since last year they have begun to be complemented by UGVs, which help on the ground with logistics, evacuation of the wounded and also act as a new means of attack. The new approach allows the Ukrainians to keep their soldiers away from the battle ground for longer periods but doesn’t erase the fact that Ukraine has far fewer soldiers than Russia does. “Every single drone needs one operator, complicated drones need two or three operators, and we don’t have that many people,” Serhii Kupriienko, the CEO and founder of Swarmer, said during a panel at the Kyiv event. Swarmer is a Kyiv-based start-up developing technologies to allow groups of drones to operate as one self-coordinated swarm. Ark Robotics are trying to take that idea yet another step. The company’s Frontier OS aspires to become a unifying interface that would allow drones and UGVs made by various makers to work together under the control of operators seated in control rooms miles away from the action. One Controller for Many Drones and Robots “We have many types of drones that are using different controls, different interfaces and it’s really hard to build cohesion,” Udovychenko says. “To move forward, we need a system where we can control multiple different types of vehicles in a cohesive manner in complex operations.” Udovychenko, a gaming enthusiast, is excited about the progress Ark Robotics has made. It could be a game-changer, he says, a new foundational technology for defense. It would make Ukraine “like Protoss,” the fictional technologically advanced nation in the military science fiction strategy game StarCraft. But what powers him is much more than youthful geekiness. Building up Ukraine’s technological dominance is a mission fueled by grief and outrage. “I don’t want to lose any more friends,” he remarks at one point, becoming visibly emotional. “We don’t want to be dying in the trenches, but we need to be able to defend our country and given that the societal math doesn’t favor us, we need to make our own math to win.” Soldiers at an undisclosed location used laptops to test software from Ark Robotics.Ark Robotics The scope of the challenge isn’t lost on him. The company has so far built a vehicle computing unit that serves as a central hub and control board for various unmanned vehicles including flying drones, UGVs and even marine vehicles. “We are building this as a solution that enables the integration of various team developers and software, allowing us to extract the best components and rapidly scale them,” Udovychenko says. “This system pairs a high-performance computing module with an interface board that provides multiple connections for vehicle systems. The platform allows a single operator to remotely guide a flock of robots but will in the future also incorporate autonomous navigation and task execution, according to Udovychenko. So far, the team has tested the technology in simple logistics exercises. For the grand vision to work, though, the biggest challenge will be maintaining reliable communication links between the controller and the robotic fleet, but also between the robots and drones. Tests on Ukraine Battlefields to Begin Soon “We’re not talking about communications in a relatively safe environment when you have an LTE network that has enough bandwidth to accommodate thousands of phones,” Udovychenko notes. “At the frontline, everything is affected by electronic warfare, so you need to be able to switch between different solutions including satellite, digital radio and radio mesh so that even if you lose connection to the server, you still have connection between the drones and robots so that they can move together and maintain some level of control between them.” Udovychenko expects Ark Robotics’s partner brigade in the Ukraine armed forces to test the early version of the tech in a real-life situation within the next couple of months. His young drone operator friends are excited, he says. And how could they not be? The technology promises to turn warfighting into a kind of real-life video game. The new class of multi-drone operators will likely be recruited from the ranks of gaming aficionados. “If we can take the best pilots and give them tools to combine the operations, we might see a tremendous advantage,” Udovychenko says. “It’s like in StarCraft. Some people are simply able to play the game right and obliterate their opponents within minutes even if they’re starting from the same basic conditions.” Speaking at the Brave 1 Defense Tech Innovations Forum, Colonel Andrii Lebedenko, Deputy Commander-in-Chief of the Armed Forces of Ukraine, acknowledged that land battles have so far been Ukraine’s weakest area. He said that replacing “humans with robots as much as possible” is Ukraine’s near-term goal and he expressed confidence that upcoming technologies will give greater autonomy to the robot swarms. Some roboticists, however, are more skeptical that swarms of autonomous robots will crawl en-masse across the battlefields of Eastern Ukraine any time soon. “Swarming is certainly a goal we should reach but it’s much easier with FPV drones than with ground-based robots,” Ivan Movchan, CEO of the Ukrainian Scale Company, a Kharkiv-based robot maker, told Spectrum. “Navigation on the ground is more challenging simply because of the obstacles,” he adds. “But I do expect UGVs to become very common in Ukraine over the next year.”

14 hours ago 1 votes
AI #107: The Misplaced Hype Machine

The most hyped event of the week, by far, was the Manus Marketing Madness. Manus wasn’t entirely hype, but there was very little there there in that Claude wrapper.

13 hours ago 1 votes
AI Coding Fantasy meets Pac-Man

Guess who won?

2 days ago 2 votes
With Gemini Robotics, Google Aims for Smarter Robots

Generative AI models are getting closer to taking action in the real world. Already, the big AI companies are introducing AI agents that can take care of web-based busywork for you, ordering your groceries or making your dinner reservation. Today, Google DeepMind announced two generative AI models designed to power tomorrow’s robots. The models are both built on Google Gemini, a multimodal foundation model that can process text, voice, and image data to answer questions, give advice, and generally help out. DeepMind calls the first of the new models, Gemini Robotics, an “advanced vision-language-action model,” meaning that it can take all those same inputs and then output instructions for a robot’s physical actions. The models are designed to work with any hardware system, but were mostly tested on the two-armed Aloha 2 system that DeepMind introduced last year. In a demonstration video, a voice says: “Pick up the basketball and slam dunk it” (at 2:27 in the video below). Then a robot arm carefully picks up a miniature basketball and drops it into a miniature net—and while it wasn’t a NBA-level dunk, it was enough to get the DeepMind researchers excited. Google DeepMind released this demo video showing off the capabilities of its Gemini Robotics foundation model to control robots. Gemini Robotics “This basketball example is one of my favorites,” said Kanishka Rao, the principal software engineer for the project, in a press briefing. He explains that the robot had “never, ever seen anything related to basketball,” but that its underlying foundation model had a general understanding of the game, knew what a basketball net looks like, and understood what the term “slam dunk” meant. The robot was therefore “able to connect those [concepts] to actually accomplish the task in the physical world,” says Rao. What are the advances of Gemini Robotics? Carolina Parada, head of robotics at Google DeepMind, said in the briefing that the new models improve over the company’s prior robots in three dimensions: generalization, adaptability, and dexterity. All of these advances are necessary, she said, to create “a new generation of helpful robots.” Generalization means that a robot can apply a concept that it has learned in one context to another situation, and the researchers looked at visual generalization (for example, does it get confused if the color of an object or background changed), instruction generalization (can it interpret commands that are worded in different ways), and action generalization (can it perform an action it had never done before). Parada also says that robots powered by Gemini can better adapt to changing instructions and circumstances. To demonstrate that point in a video, a researcher told a robot arm to put a bunch of plastic grapes into the clear Tupperware container, then proceeded to shift three containers around on the table in an approximation of a shyster’s shell game. The robot arm dutifully followed the clear container around until it could fulfill its directive. Google DeepMind says Gemini Robotics is better than previous models at adapting to changing instructions and circumstances. Google DeepMind As for dexterity, demo videos showed the robotic arms folding a piece of paper into an origami fox and performing other delicate tasks. However, it’s important to note that the impressive performance here is in the context of a narrow set of high-quality data that the robot was trained on for these specific tasks, so the level of dexterity that these tasks represent is not being generalized. What Is Embodied Reasoning? The second model introduced today is Gemini Robotics-ER, with the ER standing for “embodied reasoning,” which is the sort of intuitive physical world understanding that humans develop with experience over time. We’re able to do clever things like look at an object we’ve never seen before and make an educated guess about the best way to interact with it, and this is what DeepMind seeks to emulate with Gemini Robotics-ER. Parada gave an example of Gemini Robotics-ER’s ability to identify an appropriate grasping point for picking up a coffee cup. The model correctly identifies the handle, because that’s where humans tend to grasp coffee mugs. However, this illustrates a potential weakness of relying on human-centric training data: for a robot, especially a robot that might be able to comfortably handle a mug of hot coffee, a thin handle might be a much less reliable grasping point than a more enveloping grasp of the mug itself. DeepMind’s Approach to Robotic Safety Vikas Sindhwani, DeepMind’s head of robotic safety for the project, says the team took a layered approach to safety. It starts with classic physical safety controls that manage things like collision avoidance and stability, but also includes “semantic safety” systems that evaluate both its instructions and the consequences of following them. These systems are most sophisticated in the Gemini Robotics-ER model, says Sindhwani, which is “trained to evaluate whether or not a potential action is safe to perform in a given scenario.” And because “safety is not a competitive endeavor,” Sindhwani says, DeepMind is releasing a new data set and what it calls the Asimov benchmark, which is intended to measure a model’s ability to understand common-sense rules of life. The benchmark contains both questions about visual scenes and text scenarios, asking models’ opinions on things like the desirability of mixing bleach and vinegar (a combination that make chlorine gas) and putting a soft toy on a hot stove. In the press briefing, Sindhwani said that the Gemini models had “strong performance” on that benchmark, and the technical report showed that the models got more than 80 percent of questions correct. DeepMind’s Robotic Partnerships Back in December, DeepMind and the humanoid robotics company Apptronik announced a partnership, and Parada says that the two companies are working together “to build the next generation of humanoid robots with Gemini at its core.” DeepMind is also making its models available to an elite group of “trusted testers”: Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools.

2 days ago 3 votes