More from the singularity is nearer
This is not going to be a cakewalk like self driving cars. Most of comma’s competition is now out of business, taking billions and billions of dollars with it. Re: Tesla and FSD, we always expected Tesla to have the lead, but it’s not a winner take all market, it will look more like iOS vs Android. comma has been around for 10 years, is profitable, and is now growing rapidly. In self driving, most of the competition wasn’t even playing the right game. This isn’t how it is for ML frameworks. tinygrad’s competition is playing the right game, open source, and run by some quite smart people. But this is my second startup, so hopefully taking a bit more risk is appropriate. For comma to win, all it would take is people in 2016 being wrong about LIDAR, mapping, end to end, and hand coding, which hopefully we all agree now that they were. For tinygrad to win, it requires something much deeper to be wrong about software development in general. As it stands now, tinygrad is 14556 lines. Line count is not a perfect proxy for complexity, but when you have differences of multiple orders of magnitude, it might mean something. I asked ChatGPT to estimate the lines of code in PyTorch, JAX, and MLIR. JAX = 400k MLIR = 950k PyTorch = 3300k They range from one to two orders of magnitude off. And this isn’t even including all the libraries and drivers the other frameworks rely on, CUDA, cuBLAS, Triton, nccl, LLVM, etc…. tinygrad includes every single piece of code needed to drive an AMD RDNA3 GPU except for LLVM, and we plan to remove LLVM in a year or two as well. But so what? What does line count matter? One hypothesis is that tinygrad is only smaller because it’s not speed or feature competitive, and that if and when it becomes competitive, it will also be that many lines. But I just don’t think that’s true. tinygrad is already feature competitive, and for speed, I think the bitter lesson also applies to software. When you look at the machine learning ecosystem, you realize it’s just the same problems over and over again. The problem of multi machine, multi GPU, multi SM, multi ALU, cross machine memory scheduling, DRAM scheduling, SRAM scheduling, register scheduling, it’s all the same underlying problem at different scales. And yet, in all the current ecosystems, there are completely different codebases and libraries at each scale. I don’t think this stands. I suspect there is a simple formulation of the problem underlying all of the scheduling. Of course, this problem will be in NP and hard to optimize, but I’m betting the bitter lesson wins here. The goal of the tinygrad project is to abstract away everything except the absolute core problem in the cleanest way possible. This is why we need to replace everything. A model for the hardware is simple compared to a model for CUDA. If we succeed, tinygrad will not only be the fastest NN framework, but it will be under 25k lines all in, GPT-5 scale training job to MMIO on the PCIe bus! Here are the steps to get there: Expose the underlying search problem spanning several orders of magnitude. Due to the execution of neural networks not being data dependent, this problem is very amenable to search. Make sure your formulation is simple and complete. Fully capture all dimensions of the search space. The optimization goal is simple, run faster. Apply the state of the art in search. Burn compute. Use LLMs to guide. Use SAT solvers. Reinforcement learning. It doesn’t matter, there’s no way to cheat this goal. Just see if it runs faster. If this works, not only do we win with tinygrad, but hopefully people begin to rethink software in general. Of course, it’s a big if, this isn’t like comma where it was hard to lose. But if it wins… The main thing to watch is development speed. Our bet has to be that tinygrad’s development speed is outpacing the others. We have the AMD contract to train LLaMA 405B as fast as NVIDIA due in a year, let’s see if we succeed.
I signed up for Hinge. Holy shit with the boosts. How does someone who works on this wake up every morning and feel okay about themselves? Similarly with the tip screens, Uber algorithm, all the zero sum bullshit using all the tricks of psychology to extract a little bit more from every interaction in society. Nudge. Nudge. NUDGE. Want to partake in normal society like buying a coffee, going on a date, getting a ride, paying a friend. Oh, there’s a middle man now. An evil ominous middleman using state of the art AI algorithms to extract just a little bit more from you. But eventually the market will fix this, right? People will feel sick of being manipulated and move elsewhere? Ahhh, but they see that coming long before you do. They have dashboards. Quick Jeeves, tune the AI to make people feel less manipulated. Give them a little bit more for now, we have to think about maximizing lifetime customer value here. Oh the AI already did this on its own? Jeeves you’ve been replaced! People perpetually on the edge. You want to opt out of this all you say? Good luck running a competitive business! Every metric is now a target. You better maximize engagement or you will lose engagement this is a red queen’s race we can’t afford to lose! Burn all the social capital, burn all your values, FEED IT ALL TO MOLOCH! Someday, people will have to realize we live in a society. What will it take? A complete self cannibalization to the point you can’t eat your own mouth? It sure as hell isn’t going to be people opting out, that’s a collective action problem you can’t solve. Democracy, haha, you think the algorithms will let you vote to kill them? Your vote is as decoupled from action as the amount Uber pays the driver is decoupled from the fare that you pay. There’s no reform here, there’s only revolution. Will it simply be a huge financial collapse? Or do we need World War 3? And even World War 3 is on a spectrum. Will mass starvation fix this? Or will the attitude of thinking it’s okay to manipulate others at scale persist even past that? He’s got his, and I’ve got mine… If you open a government S&P 500 account for everyone with $1,000 at birth that’ll pay their social security cause it like…goes up…wait who’s creating this value again? It’s not okay. Advertising is not okay. Price discrimination is not okay. Using big data, machine learning, and psychology to manipulate others at scale is not okay. But you aren’t going to learn this lesson until you have fed a huge majority of your customers to Moloch. Modern capitialism is wireheading. Release the hypnodrones. How many cans of Pepsi did you want them to consume an hour again?
You know about Critical Race Theory, right? It says that if there’s an imbalance in, say, income between races, it must be due to discrimination. This is what wokism seems to be, and it’s moronic and false. The right wing has invented something equally stupid. Introducing Critical Trade Theory, stolen from this tweet. If there’s an imbalance in trade between countries, it must be due to unfair practices. (not due to the obvious, like one country is 10x richer than the other) There’s really only one way the trade deficits will go away, and that’s if trade goes to zero (or maybe if all these countries become richer than America). Same thing with the race deficits, no amount of “leg up” bullshit will change them. Why are all the politicians in America anti-growth anti-reality idiots who want to drive us into the poor house? The way this tariff shit is being done is another stupid form of anti-merit benefits to chosen groups of people, with a whole lot of grift to go along with it. Makes me just not want to play.
Intel is sitting on a huge amount of card inventory they can’t move, largely because of bad software. Most of this is a summary of the public #intel-hardware channel in the tinygrad discord. Intel currently is sitting on: 15,000 Gaudi 2 cards (with baseboards) 5,100 Intel Data Center GPU Max 1450s (without baseboards) If you were Intel, what would you do with them? First, starting with the Gaudi cards. The open source repo needed to control them was archived on Feb 4, 2025. There’s a closed source version of this that’s maybe still maintained, but eww closed source and do you think it’s really maintained? The architecture is kind of tragic, and that’s likely why they didn’t open source it. Unlike every other accelerator I have seen, the MMEs, which is where all the FLOPS are, are not controllable by the TPCs. While the TPCs have an LLVM port, the MME is not documented. After some poking around, I found the spec: It’s highly fixed function, looks very similar to the Apple ANE. But that’s not even the real problem with it. The problem is that it is controlled by queues, not by the TPCs. Unpacking habanalabs-dkms-1.19.2-32.all.deb you can find the queues. There is some way to push a command stream to the device so you don’t actually have to deal with the host itself for the queues. But that doesn’t prevent you having to decompose the network you are trying to run into something you can put on this fixed function block. Programmability is on a spectrum, ranging from CPUs being the easiest, to GPUs, to things like the Qualcomm DSP / Google TPU (where at least you drive the MME from the program), to this and the Apple ANE being the hardest. While it’s impressive that they actually got on MLPerf Training v4.0 training GPT3, I suspect it’s all hand coded, and if you even can deviate off the trodden path you’ll get almost no perf. Accelerators like this are okay for low power inference where you can adjust the model architecture for the target, Apple does a great job of this. But this will never be acceptable for a training chip. Then there’s the Data Center GPU Max 1450. Intel actually sent us a few of these. You quickly run into a problem…how do you plug them in? They need OAM sockets, 48V power, and a cooling solution that can sink 600W. As far as I can tell, they were only ever deployed in two systems, the Aurora Supercomputer and the Dell XE9640. It’s hard to know, but I really doubt many of these Dell systems were sold. Intel then sent us this carrier board. In some ways it’s helpful, but in other ways it’s not at all. It still doesn’t solve cooling or power, and you need to buy 16x MCIO cables (cheap in quantity, but expensive and hard to find off the shelf). Also, I never got a straight answer, but I really doubt Intel has many of these boards. And that board doesn’t look cheap to manufacturer more of. The connectors alone, which you need two of per GPU, cost $26 each. That’s $104 for just the OAM connectors. tiny corp was in discussions to buy these GPUs. How much would you pay for one of these on a PCIe card? The specs look great. 839 TFLOPS, 128 GB of ram, 3.3 TB/s of bandwidth. However…read this article. Even in simple synthetic benchmarks, the chip doesn’t get anywhere near its max performance, and it looks to be for fundamental reasons like memory latency. We estimate we could sell PCIe versions of these GPUs for $1,000; I don’t think most people know how hard it is to move non NVIDIA hardware. Before you say you’d pay more, ask yourself, do you really want to deal with the software? An adapter card has four pieces. A PCB for the card, a 12->48V voltage converter, a heatsink, and a fan. My quote from the guy who makes an OAM adapter board was $310 for 10+ PCBs and $75 for the voltage converter. A heatsink that can handle 600W (heat pipes + vapor chamber) is going to cost $100, then maybe $20 more for the fan. That’s $505, and you still need to assemble and test them, oh and now there’s tariffs. Maybe you can get this down to $400 in ~1000 quantity. So $200 for the GPU, $400 for the adapter, $100 for shipping/fulfillment/returns (more if you use Amazon), and 30% profit if you sell at $1k. tiny would net $1M on this, which has to cover NRE and you have risk of unsold inventory. We offered Intel $200 per GPU (a $680k wire) and they said no. They wanted $600. I suspect that unless a supercomputer person who already uses these GPUs wants to buy more, they will ride it to zero. tl;dr: there’s 5100 of these GPUs with no simple way to plug them in. It’s unclear if they worth the cost of the slot they go in. I bet they end up shredded, or maybe dumped on eBay for $50 each in a year like the Xeon Phi cards. If you buy one, good luck plugging it in! The reason Meta and friends buy some AMD is as a hedge against NVIDIA. Even if it’s not usable, AMD has progressed on a solid steady roadmap, with a clear continuation from the 2018 MI50 (which you can now buy for 99% off), to the MI325X which is a super exciting chip (AMD is king of chiplets). They are even showing signs of finally investing in software, which makes me bullish. If NVIDIA stumbles for a generation, this is AMD’s game. The ROCm “copy each NVIDIA repo” strategy actually works if your competition stumbles. They can win GPUs with slow and steady improvement + competition stumbling, that’s how AMD won server CPUs. With these Intel chips, I’m not sure who they would appeal to. Ponte Vecchio is cancelled. There’s no point in investing in the platform if there’s not going to be a next generation, and therefore nobody can justify the cost of developing software, therefore there won’t be software, therefore they aren’t worth plugging in. Where does this leave Intel’s AI roadmap? The successor to Ponte Vecchio was Rialto Bridge, but that was cancelled. The successor to that was Falcon Shores, but that was also cancelled. Intel claims the next GPU will be “Jaguar Shores”, but fool me once… To quote JazzLord1234 from reddit “No point even bothering to listen to their roadmaps anymore. They have squandered all their credibility.” Gaudi 3 is a flop due to “unbaked software”, but as much as I usually do blame software, nothing has changed from Gaudi 2 and it’s just a really hard chip to program for. So there’s no future there either. I can’t say that “Jaguar Shores” square instills confidence. It didn’t inspire confidence for “Joseph B.” on LinkedIn either. From my interactions with Intel people, it seems there’s no individuals with power there, it’s all committee like leadership. The problem with this is there’s nobody who can say yes, just many people who can say no. Hence all the cancellations and the nonsense strategy. AMD’s dysfunction is different. from the beginning they had leadership that can do things (Lisa Su replied to my first e-mail), they just didn’t see the value in investing in software until recently. They sort of had a point if they were only targeting hyperscalars. but it seems like SemiAnalysis got through to them that hyperscalars aren’t going to deal with bad software either. It remains to be seem if they can shift culture to actually deliver good software, but there’s movement in that direction, and if they succeed AMD is so undervalued. Their hardware is good. With Intel, until that committee style leadership is gone, there’s 0 chance for success. Committee leadership is fine if you are trying to maintain, but Intel’s AI situation is even more hopeless than AMDs, and you’d need something major to turn it around. At least with AMD, you can try installing ROCm and be frustrated when there are bugs. Every time I have tried Intel’s software I can’t even recall getting the import to work, and the card wasn’t powerful enough that I cared. Intel needs actual leadership to turn this around, or there’s 0 future in Intel AI.
More in programming
Exploring diagram.website, I came across The Computer is a Feeling by Tim Hwang and Omar Rizwan: the modern internet exerts a tyranny over our imagination. The internet and its commercial power has sculpted the computer-device. It's become the terrain of flat, uniform, common platforms and protocols, not eccentric, local, idiosyncratic ones. Before computers were connected together, they were primarily personal. Once connected, they became primarily social. The purpose of the computer shifted to become social over personal. The triumph of the internet has also impoverished our sense of computers as a tool for private exploration rather than public expression. The pre-network computer has no utility except as a kind of personal notebook, the post-network computer demotes this to a secondary purpose. Smartphones are indisputably the personal computer. And yet, while being so intimately personal, they’re also the largest distribution of behavior-modification devices the world has ever seen. We all willing carry around in our pockets a device whose content is largely designed to modify our behavior and extract our time and money. Making “computer” mean computer-feelings and not computer-devices shifts the boundaries of what is captured by the word. It removes a great many things – smartphones, language models, “social” “media” – from the domain of the computational. It also welcomes a great many things – notebooks, papercraft, diary, kitchen – back into the domain of the computational. I love the feeling of a personal computer, one whose purpose primarily resides in the domain of the individual and secondarily supports the social. It’s part of what I love about the some of the ideas embedded in local-first, which start from the principle of owning and prioritizing what you do on your computer first and foremost, and then secondarily syncing that to other computers for the use of others. Email · Mastodon · Bluesky
I started working on Edna several months ago and I’ve implemented lots of functionality. Edna is a note taking application with super powers. I figured I’ll make a series of posts about all the features I’ve added in last few months. The first is multiple notes. By default we start with 3 notes: scratch inbox daily journal Here’s a note switcher (Ctrl + K): From note switcher you can: quickly find a note by partial name open selected note with Enter or mouse click create new note: enter fully unique note name and Enter or Ctrl + Enter if it partially matches existing note. I learned this trick from Notational Velocity delete note with Ctrl + Delete archive notes with icon on the right star / un-star (add to favorites, remove from favorites) by clicking star icon on the left assign quick access shortcut Alt + <n> You can also rename notes: context menu (right click mouse) and This note / Rename Rename current note in command palette (Ctrl + Shift + K) Use context menu This note sub-menu for note-related commands. Note: I use Windows keyboard bindings. For Mac equivalent, visit https://edna.arslexis.io/help#keyboard-shortcuts
I’ve never published an essay quite like this. I’ve written about my life before, reams of stuff actually, because that’s how I process what I think, but never for public consumption. I’ve been pushing myself to write more lately because my co-authors and I have a whole fucking book to write between now and October. […]
As search gets worse and “working code” gets cheaper, apps get easier to make from scratch than to find.