More from IEEE Spectrum
Generative AI models are getting closer to taking action in the real world. Already, the big AI companies are introducing AI agents that can take care of web-based busywork for you, ordering your groceries or making your dinner reservation. Today, Google DeepMind announced two generative AI models designed to power tomorrow’s robots. The models are both built on Google Gemini, a multimodal foundation model that can process text, voice, and image data to answer questions, give advice, and generally help out. DeepMind calls the first of the new models, Gemini Robotics, an “advanced vision-language-action model,” meaning that it can take all those same inputs and then output instructions for a robot’s physical actions. The models are designed to work with any hardware system, but were mostly tested on the two-armed Aloha 2 system that DeepMind introduced last year. In a demonstration video, a voice says: “Pick up the basketball and slam dunk it” (at 2:27 in the video below). Then a robot arm carefully picks up a miniature basketball and drops it into a miniature net—and while it wasn’t a NBA-level dunk, it was enough to get the DeepMind researchers excited. Google DeepMind released this demo video showing off the capabilities of its Gemini Robotics foundation model to control robots. Gemini Robotics “This basketball example is one of my favorites,” said Kanishka Rao, the principal software engineer for the project, in a press briefing. He explains that the robot had “never, ever seen anything related to basketball,” but that its underlying foundation model had a general understanding of the game, knew what a basketball net looks like, and understood what the term “slam dunk” meant. The robot was therefore “able to connect those [concepts] to actually accomplish the task in the physical world,” says Rao. What are the advances of Gemini Robotics? Carolina Parada, head of robotics at Google DeepMind, said in the briefing that the new models improve over the company’s prior robots in three dimensions: generalization, adaptability, and dexterity. All of these advances are necessary, she said, to create “a new generation of helpful robots.” Generalization means that a robot can apply a concept that it has learned in one context to another situation, and the researchers looked at visual generalization (for example, does it get confused if the color of an object or background changed), instruction generalization (can it interpret commands that are worded in different ways), and action generalization (can it perform an action it had never done before). Parada also says that robots powered by Gemini can better adapt to changing instructions and circumstances. To demonstrate that point in a video, a researcher told a robot arm to put a bunch of plastic grapes into the clear Tupperware container, then proceeded to shift three containers around on the table in an approximation of a shyster’s shell game. The robot arm dutifully followed the clear container around until it could fulfill its directive. Google DeepMind says Gemini Robotics is better than previous models at adapting to changing instructions and circumstances. Google DeepMind As for dexterity, demo videos showed the robotic arms folding a piece of paper into an origami fox and performing other delicate tasks. However, it’s important to note that the impressive performance here is in the context of a narrow set of high-quality data that the robot was trained on for these specific tasks, so the level of dexterity that these tasks represent is not being generalized. What Is Embodied Reasoning? The second model introduced today is Gemini Robotics-ER, with the ER standing for “embodied reasoning,” which is the sort of intuitive physical world understanding that humans develop with experience over time. We’re able to do clever things like look at an object we’ve never seen before and make an educated guess about the best way to interact with it, and this is what DeepMind seeks to emulate with Gemini Robotics-ER. Parada gave an example of Gemini Robotics-ER’s ability to identify an appropriate grasping point for picking up a coffee cup. The model correctly identifies the handle, because that’s where humans tend to grasp coffee mugs. However, this illustrates a potential weakness of relying on human-centric training data: for a robot, especially a robot that might be able to comfortably handle a mug of hot coffee, a thin handle might be a much less reliable grasping point than a more enveloping grasp of the mug itself. DeepMind’s Approach to Robotic Safety Vikas Sindhwani, DeepMind’s head of robotic safety for the project, says the team took a layered approach to safety. It starts with classic physical safety controls that manage things like collision avoidance and stability, but also includes “semantic safety” systems that evaluate both its instructions and the consequences of following them. These systems are most sophisticated in the Gemini Robotics-ER model, says Sindhwani, which is “trained to evaluate whether or not a potential action is safe to perform in a given scenario.” And because “safety is not a competitive endeavor,” Sindhwani says, DeepMind is releasing a new data set and what it calls the Asimov benchmark, which is intended to measure a model’s ability to understand common-sense rules of life. The benchmark contains both questions about visual scenes and text scenarios, asking models’ opinions on things like the desirability of mixing bleach and vinegar (a combination that make chlorine gas) and putting a soft toy on a hot stove. In the press briefing, Sindhwani said that the Gemini models had “strong performance” on that benchmark, and the technical report showed that the models got more than 80 percent of questions correct. DeepMind’s Robotic Partnerships Back in December, DeepMind and the humanoid robotics company Apptronik announced a partnership, and Parada says that the two companies are working together “to build the next generation of humanoid robots with Gemini at its core.” DeepMind is also making its models available to an elite group of “trusted testers”: Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools.
Video Friday is your weekly selection of awesome robotics videos, collected by your friends at IEEE Spectrum robotics. We also post a weekly calendar of upcoming robotics events for the next few months. Please send us your events for inclusion. RoboCup German Open: 12–16 March 2025, NUREMBERG, GERMANY German Robotics Conference: 13–15 March 2025, NUREMBERG, GERMANY European Robotics Forum: 25–27 March 2025, STUTTGART, GERMANY RoboSoft 2025: 23–26 April 2025, LAUSANNE, SWITZERLAND ICUAS 2025: 14–17 May 2025, CHARLOTTE, NC ICRA 2025: 19–23 May 2025, ATLANTA, GA London Humanoids Summit: 29–30 May 2025, LONDON IEEE RCAR 2025: 1–6 June 2025, TOYAMA, JAPAN 2025 Energy Drone & Robotics Summit: 16–18 June 2025, HOUSTON, TX RSS 2025: 21–25 June 2025, LOS ANGELES ETH Robotics Summer School: 21–27 June 2025, GENEVA IAS 2025: 30 June–4 July 2025, GENOA, ITALY ICRES 2025: 3–4 July 2025, PORTO, PORTUGAL IEEE World Haptics: 8–11 July 2025, SUWON, KOREA IFAC Symposium on Robotics: 15–18 July 2025, PARIS RoboCup 2025: 15–21 July 2025, BAHIA, BRAZIL Enjoy today’s videos! Last year, we unveiled the new Atlas—faster, stronger, more compact, and less messy. We’re designing the world’s most dynamic humanoid robot to do anything and everything, but we get there one step at a time. Our first task is part sequencing, a common logistics task in automotive manufacturing. Discover why we started with sequencing, how we are solving hard problems, and how we’re delivering a humanoid robot with real value. My favorite part is 1:40, where Atlas squats down to pick a part up off the ground. [ Boston Dynamics ] I’m mostly impressed that making contact with that stick doesn’t cause the robot to fall over. [ Unitree ] Professor Patrícia Alves-Oliveira is studying authenticity of artworks co-created by an artist and a robot. Her research lab, Robot Studio, is developing methods to authenticate artwork by analyzing their entire creative process. This is accomplished by using the artist’s biometrics as well as the process of artwork creation, from the first brushstroke to the final painting. This work aims to bring ownership back to artists in the age of generative AI. [ Robot Studio ] at [ University of Michigan ] Hard to believe that RoMeLa has been developing humanoid robots for 20 (!) years. Here’s to 20 more! [ RoMeLa ] at [ University of California Los Angeles ] In this demo, Reachy 2 autonomously sorts healthy and unhealthy foods. No machine learning, no pre-trained AI—just real-time object detection! [ Pollen ] Biological snakes achieve high mobility with numerous joints, inspiring snake-like robots for rescue and inspection. However, conventional designs feature a limited number of joints. This paper presents an underactuated snake robot consisting of many passive links that can dynamically change its joint coupling configuration by repositioning motor-driven joint units along internal rack gears. Furthermore, a soft robot skin wirelessly powers the units, eliminating wire tangling and disconnection risks. [ Paper ] Thanks, Ayato! Tech United Eindhoven is working on quadrupedal soccer robots, which should be fun. [ Tech United ] Autonomous manipulation in everyday tasks requires flexible action generation to handle complex, diverse real-world environments, such as objects with varying hardness and softness. Imitation Learning (IL) enables robots to learn complex tasks from expert demonstrations. However, a lot of existing methods rely on position/unilateral control, leaving challenges in tasks that require force information/control, like carefully grasping fragile or varying-hardness objects. To address these challenges, we introduce Bilateral Control-Based Imitation Learning via Action Chunking with Transformers(Bi-ACT) and”A” “L”ow-cost “P”hysical “Ha”rdware Considering Diverse Motor Control Modes for Research in Everyday Bimanual Robotic Manipulation (ALPHA-α). [ Alpha-Biact ] Thanks, Masato! Powered by UBTECH’s revolutionary framework “BrainNet”, a team of Walker S1 humanoid robots work together to master complex tasks at Zeekr’s Smart Factory! Teamwork makes the dream of robots work. [ UBTECH ] Personal mobile robotic assistants are expected to find wide applications in industry and healthcare. However, manually steering a robot while in motion requires significant concentration from the operator, especially in tight or crowded spaces. This work presents a virtual leash with which a robot can naturally follow an operator. We successfully validate on the ANYmal platform the robustness and performance of our entire pipeline in real-world experiments. [ ETH Zurich Robotic Systems Lab ] I do not ever want to inspect a wind turbine blade from the inside. [ Flyability ] Sometimes you can learn more about a robot from an instructional unboxing video than from a fancy demo. [ DEEP Robotics ] Researchers at Penn Engineering have discovered that certain features of AI-governed robots carry security vulnerabilities and weaknesses that were previously unidentified and unknown. Funded by the National Science Foundation and the Army Research Laboratory, the research aims to address the emerging vulnerability for ensuring the safe deployment of large language models (LLMs) in robotics. [ RoboPAIR ] ReachBot is a joint project between Stanford and NASA to explore a new approach to mobility in challenging environments such as martian caves. It consists of a compact robot body with very long extending arms, based on booms used for extendable antennas. The booms unroll from a coil and can extend many meters in low gravity. In this talk I will introduce the ReachBot design and motion planning considerations, report on a field test with a single ReachBot arm in a lava tube in the Mojave Desert, and discuss future plans, which include the possibility of mounting one or more ReachBot arms equipped with wrists and grippers on a mobile platform – such as ANYMal. [ ReachBot ]
Although they’re a staple of sci-fi movies and conspiracy theories, in real life, tiny flying microbots—weighed down by batteries and electronics—have struggled to get very far. But a new combination of circuits and lightweight solid-state batteries called a “flying batteries” topology could let these bots really take off, potentially powering microbots for hours from a system that weighs milligrams. Microbots could be an important technology to find people buried in rubble or scout ahead in other dangerous situations. But they’re a difficult engineering challenge, says Patrick Mercier, an electrical and computer engineering professor at University of California San Diego. Mercier’s student Zixiao Lin described the new circuit last month at IEEE International Solid State Circuits Conference (ISSCC). “You have these really tiny robots, and you want them to last as long as possible in the field,” Mercier says. “The best way to do that is to use lithium-ion batteries, because they have the best energy density. But there’s this fundamental problem, where the actuators need much higher voltage than what the battery is capable of providing.” A lithium cell can provide about 4 volts, but piezoelectric actuators for microbots need tens to hundreds of volts, explains Mercier. Researchers, including Mercier’s own group, have developed circuits such as boost converters to pump up the voltage. But because they need relatively large inductors or a bunch of capacitors, these add too much mass and volume, typically taking up about as much room as the battery itself. A new kind of solid-state battery, developed at the French national electronics laboratory CEA-Leti, offered a potential solution. The batteries are a thin-film stack of material, including lithium cobalt oxide and lithium phosphorus oxynitride, made using semiconductor processing technology, and they can be diced up into tiny cells. A 0.33-cubic-millimeter, 0.8-milligram cell can store 20 microampere-hours of charge, or about 60 ampere-hours per liter. (Lithium-ion earbud batteries provide more than 100 A-h/L, but are about 1000 times as large.) A CEA-Leti spinoff based on the technology, Inject Power, in Grenoble, France, is gearing up to begin volume manufacturing in late 2026. Stacking Batteries on the Fly The solid-state battery’s ability to be diced up into tiny cells suggested that researchers could achieve high-voltages using a circuit that needs no capacitors or inductors. Instead, the circuit actively rearranges the connections among many tiny batteries moving them from parallel to serial and back again. Imagine a microdrone that moves by flapping wings attached to a piezoelectric actuator. On its circuit board are a dozen or so of the solid-state microbatteries. Each battery is part of a circuit consisting of four transistors. These act as switches that can dynamically change the connection to that battery’s neighbor so that it is either parallel, so they share the same voltage, or serial, so their voltages are added. At the start, all the batteries are in parallel, delivering a voltage that is nowhere near enough to trigger the actuator. The 2-mm2 IC the UCSD team built then begins opening and closing the transistor switches. This rearranges the connections between the cells so that first two cells are connected serially, then three, then four, and so on. In a few hundredths of a second, the batteries are all connected in series, and the voltage has piled so much charge onto the actuator that it snaps the microbot’s wings down. The IC then unwinds the process, making the batteries parallel again, one at a time. The integrated circuit in the “flying battery” has a total area of 2 square millimeters.Patrick Mercier Adiabatic Charging Why not just connect every battery in series at once instead of going through this ramping up and down scheme? In a word, efficiency. As long as the battery serialization and parallelization is done at a low-enough frequency, the system is charging adiabatically. That is, its power losses are minimized. But it’s what happens after the actuator triggers “where the real magic comes in,” says Mercier. The piezoelectric actuator in the circuit acts like a capacitor, storing energy. “Just like you have regenerative breaking in a car, we can recover some of the energy that we stored in this actuator.” As each battery is unstacked, the remaining energy storage system has a lower voltage than the actuator, so some charge flows back into the batteries. The UCSD team actually tested two varieties of solid-state microbatteries—1.5-volt ceramic version from Tokyo-based TDK (CeraCharge 1704-SSB) and a 4-volt custom design from CEA-Leti. With 1.6 grams of TDK cells, the circuit reached 56.1 volts and delivered a power density of 79 milliwatts per gram, but with 0.014 grams of the custom storage, it maxed out at 68 volts, and demonstrated a power density of 4,500 mW/g. Mercier plans to test the system with robotics partners while his team and CEA-Leti work to improved the flying batteries system’s packaging, miniaturization, and other properties. One important characteristic that needs work is the internal resistance of the microbatteries. “The challenge there is that the more you stack, the higher the series resistance is, and therefore the lower the frequency we can operate the system,” he says. Nevertheless, Mercier seems bullish on flying batteries’ chances of keeping microbots aloft. “Adiabatic charging with charge recovery and no passives: Those are two wins that help increase flight time.”
Salto has been one of our favorite robots since we were first introduced to it in 2016 as a project out of Ron Fearing’s lab at UC Berkeley. The palm-sized spring-loaded jumping robot has gone from barely being able to chain together a few open-loop jumps to mastering landings, bouncing around outside, powering through obstacle courses, and occasionally exploding. What’s quite unusual about Salto is that it’s still an active research project—nine years is an astonishingly long life time for any robot, especially one without any immediately obvious practical applications. But one of Salto’s original creators, Justin Yim (who is now a professor at the University of Illinois), has found a niche where Salto might be able to do what no other robot can: mid-air sampling of the water geysering out of the frigid surface of Enceladus, a moon of Saturn. What makes Enceladus so interesting is that it’s completely covered in a 40 kilometer thick sheet of ice, and underneath that ice is a 10 km-deep global ocean. And within that ocean can be found—we know not what. Diving in that buried ocean is a problem that robots may be able to solve at some point, but in the near(er) term, Enceladus’ south pole is home to over a hundred cryovolcanoes that spew plumes of water vapor and all kinds of other stuff right out into space, offering a sampling opportunity to any robot that can get close enough for a sip. “We can cover large distances, we can get over obstacles, we don’t require an atmosphere, and we don’t pollute anything.” —Justin Yim, University of Illinois Yim, along with another Salto veteran Ethan Schaler (now at JPL), have been awarded funding through NASA’s Innovative Advanced Concepts (NIAC) program to turn Salto into a robot that can perform “Legged Exploration Across the Plume,” or in an only moderately strained backronym, LEAP. LEAP would be a space-ified version of Salto with a couple of major modifications allowing it to operate in a freezing, airless, low-gravity environment. Exploring Enceladus’ Challenging Terrain As best as we can make out from images taken during Cassini flybys, the surface of Enceladus is unfriendly to traditional rovers, covered in ridges and fissures, although we don’t have very much information on the exact properties of the terrain. There’s also essentially no atmosphere, meaning that you can’t fly using aerodynamics, and if you use rockets to fly instead, you run the risk of your exhaust contaminating any samples that you take. “This doesn’t leave us with a whole lot of options for getting around, but one that seems like it might be particularly suitable is jumping,” Yim tells us. “We can cover large distances, we can get over obstacles, we don’t require an atmosphere, and we don’t pollute anything.” And with Enceladus’ gravity being just 1/80th that of Earth, Salto’s meter-high jump on Earth would enable it to travel a hundred meters or so on Enceladus, taking samples as it soars through cryovolcano plumes. The current version of Salto does require an atmosphere, because it uses a pair of propellers as tiny thrusters to control yaw and roll. On LEAP, those thrusters would be replaced with an angled pair of reaction wheels instead. To deal with the terrain, the robot will also likely need a foot that can handle jumping from (and landing on) surfaces composed of granular ice particles. LEAP is designed to jump through Enceladus’ many plumes to collect samples, and use the moon’s terrain to direct subsequent jumps.NASA/Justin Yim While the vision is for LEAP to jump continuously, bouncing over the surface and through plumes in a controlled series of hops, sooner or later it’s going to have a bad landing, and the robot has to be prepared for that. “I think one of the biggest new technological developments is going to be multimodal locomotion,” explains Yim. “Specifically, we’d like to have a robust ability to handle falls.” The reaction wheels can help with this in two ways: they offer some protection by acting like a shell around the robot, and they can also operate as a regular pair of wheels, allowing the robot to roll around on the ground a little bit. “With some maneuvers that we’re experimenting with now, the reaction wheels might also be able to help the robot to pop itself back upright so that it can start jumping again after it falls over,” Yim says. A NIAC project like this is about as early-stage as it gets for something like LEAP, and an Enceladus mission is very far away as measured by almost every metric—space, time, funding, policy, you name it. Long term, the idea with LEAP is that it could be an add-on to a mission concept called the Enceladus Orbilander. This US $2.5 billion spacecraft would launch sometime in the 2030s, and spend about a dozen years getting to Saturn and entering orbit around Enceladus. After 1.5 years in orbit, the spacecraft would land on the surface, and spend a further 2 years looking for biosignatures. The Orbilander itself would be stationary, Yim explains, “so having this robotic mobility solution would be a great way to do expanded exploration of Enceladus, getting really long distance coverage to collect water samples from plumes on different areas of the surface.” LEAP has been funded through a nine-month Phase 1 study that begins this April. While the JPL team investigates ice-foot interactions and tries to figure out how to keep the robot from freezing to death, at the University of Illinois Yim will be upgrading Salto with self-righting capability. Honestly, it’s exciting to think that after so many years, Salto may have finally found an application where it offers the actual best solution for solving this particular problem of low-gravity mobility for science.
More in AI
Despite the Noise, The Big Narratives from January are Still the Big Narratives
Generative AI models are getting closer to taking action in the real world. Already, the big AI companies are introducing AI agents that can take care of web-based busywork for you, ordering your groceries or making your dinner reservation. Today, Google DeepMind announced two generative AI models designed to power tomorrow’s robots. The models are both built on Google Gemini, a multimodal foundation model that can process text, voice, and image data to answer questions, give advice, and generally help out. DeepMind calls the first of the new models, Gemini Robotics, an “advanced vision-language-action model,” meaning that it can take all those same inputs and then output instructions for a robot’s physical actions. The models are designed to work with any hardware system, but were mostly tested on the two-armed Aloha 2 system that DeepMind introduced last year. In a demonstration video, a voice says: “Pick up the basketball and slam dunk it” (at 2:27 in the video below). Then a robot arm carefully picks up a miniature basketball and drops it into a miniature net—and while it wasn’t a NBA-level dunk, it was enough to get the DeepMind researchers excited. Google DeepMind released this demo video showing off the capabilities of its Gemini Robotics foundation model to control robots. Gemini Robotics “This basketball example is one of my favorites,” said Kanishka Rao, the principal software engineer for the project, in a press briefing. He explains that the robot had “never, ever seen anything related to basketball,” but that its underlying foundation model had a general understanding of the game, knew what a basketball net looks like, and understood what the term “slam dunk” meant. The robot was therefore “able to connect those [concepts] to actually accomplish the task in the physical world,” says Rao. What are the advances of Gemini Robotics? Carolina Parada, head of robotics at Google DeepMind, said in the briefing that the new models improve over the company’s prior robots in three dimensions: generalization, adaptability, and dexterity. All of these advances are necessary, she said, to create “a new generation of helpful robots.” Generalization means that a robot can apply a concept that it has learned in one context to another situation, and the researchers looked at visual generalization (for example, does it get confused if the color of an object or background changed), instruction generalization (can it interpret commands that are worded in different ways), and action generalization (can it perform an action it had never done before). Parada also says that robots powered by Gemini can better adapt to changing instructions and circumstances. To demonstrate that point in a video, a researcher told a robot arm to put a bunch of plastic grapes into the clear Tupperware container, then proceeded to shift three containers around on the table in an approximation of a shyster’s shell game. The robot arm dutifully followed the clear container around until it could fulfill its directive. Google DeepMind says Gemini Robotics is better than previous models at adapting to changing instructions and circumstances. Google DeepMind As for dexterity, demo videos showed the robotic arms folding a piece of paper into an origami fox and performing other delicate tasks. However, it’s important to note that the impressive performance here is in the context of a narrow set of high-quality data that the robot was trained on for these specific tasks, so the level of dexterity that these tasks represent is not being generalized. What Is Embodied Reasoning? The second model introduced today is Gemini Robotics-ER, with the ER standing for “embodied reasoning,” which is the sort of intuitive physical world understanding that humans develop with experience over time. We’re able to do clever things like look at an object we’ve never seen before and make an educated guess about the best way to interact with it, and this is what DeepMind seeks to emulate with Gemini Robotics-ER. Parada gave an example of Gemini Robotics-ER’s ability to identify an appropriate grasping point for picking up a coffee cup. The model correctly identifies the handle, because that’s where humans tend to grasp coffee mugs. However, this illustrates a potential weakness of relying on human-centric training data: for a robot, especially a robot that might be able to comfortably handle a mug of hot coffee, a thin handle might be a much less reliable grasping point than a more enveloping grasp of the mug itself. DeepMind’s Approach to Robotic Safety Vikas Sindhwani, DeepMind’s head of robotic safety for the project, says the team took a layered approach to safety. It starts with classic physical safety controls that manage things like collision avoidance and stability, but also includes “semantic safety” systems that evaluate both its instructions and the consequences of following them. These systems are most sophisticated in the Gemini Robotics-ER model, says Sindhwani, which is “trained to evaluate whether or not a potential action is safe to perform in a given scenario.” And because “safety is not a competitive endeavor,” Sindhwani says, DeepMind is releasing a new data set and what it calls the Asimov benchmark, which is intended to measure a model’s ability to understand common-sense rules of life. The benchmark contains both questions about visual scenes and text scenarios, asking models’ opinions on things like the desirability of mixing bleach and vinegar (a combination that make chlorine gas) and putting a soft toy on a hot stove. In the press briefing, Sindhwani said that the Gemini models had “strong performance” on that benchmark, and the technical report showed that the models got more than 80 percent of questions correct. DeepMind’s Robotic Partnerships Back in December, DeepMind and the humanoid robotics company Apptronik announced a partnership, and Parada says that the two companies are working together “to build the next generation of humanoid robots with Gemini at its core.” DeepMind is also making its models available to an elite group of “trusted testers”: Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools.
The Most Forbidden Technique is training an AI using interpretability techniques.
Back in November 2024, Scott Alexander asked: Do longer prison sentences reduce crime?