Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
42
Last year I participated in a Lifetime Deal (LTD) promotion to offer Preceden to the AppSumo community. Maybe I’ll dive into my experience there in another post, but I wanted to share an interesting thing that’s happening now, a year after the deal ended. AppSumo has a policy that says something like this: you have […]
a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Matt Mazur

2024 Year in Review

At the end of 2022 I wrapped up my contract work with Help Scout and took the plunge to work on my indie software businesses full time. I’m now two years into that adventure, and wanted to share a periodic update about how things are going. Preceden on the Back Burner I made 32 commits … Continue reading 2024 Year in Review →

2 months ago 42 votes
It’s Time to Build

It’s been a few months so I wanted to say hey to the 7 of you who follow this blog and share a few updates about what I’ve been up to. Quick recap At the start of 2023 I quit consulting to go full time on Preceden, my SaaS timeline maker, after growing it on … Continue reading It’s Time to Build →

10 months ago 123 votes
My Indie SaaS Revenue has Grown 37% per Year for 13 Years

Unlike many indie founders, I’ve never shared revenue numbers for Preceden, my SaaS timeline maker tool. Even if they were remarkable – which they are not really – I just don’t think there are many good reasons to publicly share revenue numbers, and there are lots of downsides. However, below I’ll share a chart showing … Continue reading My Indie SaaS Revenue has Grown 37% per Year for 13 Years →

a year ago 99 votes
Is the ChatGPT API Refusing to Summarize Academic Papers? Not so fast.

Yesterday on X, I shared a post about some responses I was getting from the ChatGPT 3.5 API indicating that it was refusing to summarize arXiv papers: There has been a lot of discussion recently about the perceived decrease in the quality of ChatGPT’s responses and seeing ChatGPT’s refusal here reinforced that perception for a … Continue reading Is the ChatGPT API Refusing to Summarize Academic Papers? Not so fast. →

a year ago 94 votes
Reflecting on My First Year as a Full Time Indie Founder

At the beginning of 2023 I went full time on Preceden, my SaaS timeline maker business, after 13 years of working on it on the side. A year has passed, so I wanted to share an update on how things are going and some lessons learned. Preceden My main focus in 2023 was building AI … Continue reading Reflecting on My First Year as a Full Time Indie Founder →

a year ago 96 votes

More in AI

What Mattered in GenAI in 2024

Despite the Noise, The Big Narratives from January are Still the Big Narratives

14 hours ago 2 votes
With Gemini Robotics, Google Aims for Smarter Robots

Generative AI models are getting closer to taking action in the real world. Already, the big AI companies are introducing AI agents that can take care of web-based busywork for you, ordering your groceries or making your dinner reservation. Today, Google DeepMind announced two generative AI models designed to power tomorrow’s robots. The models are both built on Google Gemini, a multimodal foundation model that can process text, voice, and image data to answer questions, give advice, and generally help out. DeepMind calls the first of the new models, Gemini Robotics, an “advanced vision-language-action model,” meaning that it can take all those same inputs and then output instructions for a robot’s physical actions. The models are designed to work with any hardware system, but were mostly tested on the two-armed Aloha 2 system that DeepMind introduced last year. In a demonstration video, a voice says: “Pick up the basketball and slam dunk it” (at 2:27 in the video below). Then a robot arm carefully picks up a miniature basketball and drops it into a miniature net—and while it wasn’t a NBA-level dunk, it was enough to get the DeepMind researchers excited. Google DeepMind released this demo video showing off the capabilities of its Gemini Robotics foundation model to control robots. Gemini Robotics “This basketball example is one of my favorites,” said Kanishka Rao, the principal software engineer for the project, in a press briefing. He explains that the robot had “never, ever seen anything related to basketball,” but that its underlying foundation model had a general understanding of the game, knew what a basketball net looks like, and understood what the term “slam dunk” meant. The robot was therefore “able to connect those [concepts] to actually accomplish the task in the physical world,” says Rao. What are the advances of Gemini Robotics? Carolina Parada, head of robotics at Google DeepMind, said in the briefing that the new models improve over the company’s prior robots in three dimensions: generalization, adaptability, and dexterity. All of these advances are necessary, she said, to create “a new generation of helpful robots.” Generalization means that a robot can apply a concept that it has learned in one context to another situation, and the researchers looked at visual generalization (for example, does it get confused if the color of an object or background changed), instruction generalization (can it interpret commands that are worded in different ways), and action generalization (can it perform an action it had never done before). Parada also says that robots powered by Gemini can better adapt to changing instructions and circumstances. To demonstrate that point in a video, a researcher told a robot arm to put a bunch of plastic grapes into the clear Tupperware container, then proceeded to shift three containers around on the table in an approximation of a shyster’s shell game. The robot arm dutifully followed the clear container around until it could fulfill its directive. Google DeepMind says Gemini Robotics is better than previous models at adapting to changing instructions and circumstances. Google DeepMind As for dexterity, demo videos showed the robotic arms folding a piece of paper into an origami fox and performing other delicate tasks. However, it’s important to note that the impressive performance here is in the context of a narrow set of high-quality data that the robot was trained on for these specific tasks, so the level of dexterity that these tasks represent is not being generalized. What Is Embodied Reasoning? The second model introduced today is Gemini Robotics-ER, with the ER standing for “embodied reasoning,” which is the sort of intuitive physical world understanding that humans develop with experience over time. We’re able to do clever things like look at an object we’ve never seen before and make an educated guess about the best way to interact with it, and this is what DeepMind seeks to emulate with Gemini Robotics-ER. Parada gave an example of Gemini Robotics-ER’s ability to identify an appropriate grasping point for picking up a coffee cup. The model correctly identifies the handle, because that’s where humans tend to grasp coffee mugs. However, this illustrates a potential weakness of relying on human-centric training data: for a robot, especially a robot that might be able to comfortably handle a mug of hot coffee, a thin handle might be a much less reliable grasping point than a more enveloping grasp of the mug itself. DeepMind’s Approach to Robotic Safety Vikas Sindhwani, DeepMind’s head of robotic safety for the project, says the team took a layered approach to safety. It starts with classic physical safety controls that manage things like collision avoidance and stability, but also includes “semantic safety” systems that evaluate both its instructions and the consequences of following them. These systems are most sophisticated in the Gemini Robotics-ER model, says Sindhwani, which is “trained to evaluate whether or not a potential action is safe to perform in a given scenario.” And because “safety is not a competitive endeavor,” Sindhwani says, DeepMind is releasing a new data set and what it calls the Asimov benchmark, which is intended to measure a model’s ability to understand common-sense rules of life. The benchmark contains both questions about visual scenes and text scenarios, asking models’ opinions on things like the desirability of mixing bleach and vinegar (a combination that make chlorine gas) and putting a soft toy on a hot stove. In the press briefing, Sindhwani said that the Gemini models had “strong performance” on that benchmark, and the technical report showed that the models got more than 80 percent of questions correct. DeepMind’s Robotic Partnerships Back in December, DeepMind and the humanoid robotics company Apptronik announced a partnership, and Parada says that the two companies are working together “to build the next generation of humanoid robots with Gemini at its core.” DeepMind is also making its models available to an elite group of “trusted testers”: Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools.

13 hours ago 1 votes
AI Coding Fantasy meets Pac-Man

Guess who won?

11 hours ago 1 votes
The Most Forbidden Technique

The Most Forbidden Technique is training an AI using interpretability techniques.

14 hours ago 1 votes
Speaking things into existence

Expertise in a vibe-filled world of work

2 days ago 4 votes