Full Width [alt+shift+f] FOCUS MODE Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
35
Yesterday I debated Eliezer Yudkowsky This is totally me spitballing here, but I think the crux of the debate is whether AIs will look more like formal systems or like inscrutable weight matrices. I think it’s the latter, and I think it’s the something like the latter forever. This is a good thing. I’m so glad we reached an empirical point of disagreement. Can superintelligent AIs “solve” the one shot prisoners dilemma and play cooperate-cooperate? Yudkowsky thinks yes, I think no. The payoff matrix makes the solution very clear. You want to convince your opponent to cooperate while you defect. Of course you end up with defect-defect while you’d both rather cooperate-cooperate, but that’s just how it works. Is this what the field of AI alignment is trying to “solve”? For two complex systems in the real world with time bounds, I’m 99% sure this is impossible. MIRI has done research into this, and has shown it was possible for “modal agents.” However, it’s clearly not solvable for...
over a year ago

Comments

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from the singularity is nearer

you are a good person

In my previous post, I advocate turning against the unproductive. Whenever you decide to turn against a group, it’s very important to prevent purity spirals. There needs to be a bright line that doesn’t move. Here is that line. You should be, on net, producing more than you are consuming. You shouldn’t feel bad if you are producing less than you could be. But at the end of your life, total it all up. You should have produced more than you consumed. We used to make shit in this country, build shit. It needs to stop. I have to believe that the average person is net positive, because if they aren’t, we’re already too far gone, and any prospect of a democracy is over. But if we aren’t too far gone, we have to stop the hemorrhaging. The unproductive rich are in cahoots with the unproductive poor to take from you. And it’s really the unproductive rich that are the problem. They loudly frame helping the unproductive as a moral issue for helping the poor because they know deep down they are unproductive losers. But they aren’t beyond saving. They just need to make different choices. This cultural change starts with you. Private equity, market manipulators, real estate, lawyers, lobbyists. This is no longer okay. You know the type of person I’m talking about. Let’s elevate farmers, engineers, manufacturing, miners, construction, food prep, delivery, operations. Jobs that produce value that you can point to. There’s a role for everyone in society. From productive billionaires to the fry cook at McDonalds. They are both good people. But negative sum jobs need to no longer be socially okay. The days of living off the work of everyone else are over. We live in a society. You have to produce more than you consume.

3 days ago 7 votes
you will blame the wrong people

Billionaires, am I right? Immigrants, am I right? It’s going to be so painful to watch. Billionaires will go away. Immigrants will go away. The problems will continue to get worse. The problem is the unproductive. The rich unproductive and the poor unproductive. The finance middle-man and welfare recipient. The real estate agent and the person on disability. The person with a fake job. Anyone who purposefully creates complexity for others. The obstructionists. Rent seekers. Anyone who lobbies against others getting so they can have relatively more. Broken systems that elevate zero sum losers. Why can’t we all turn against them? The first step is recognizing this is the problem. I don’t know why most people don’t see it.

4 days ago 6 votes
you can never go back

Total disassociation, fully out your mind That Funny Feeling I was thinking today about a disc jockey. Like one in the 80s, where you actually had to put the records on the turntables to get the music. You move the information. You were the file system. I like the Retro Game Mechanics channel on YouTube. What was possible was limited by the hardware, and in a weird way it forced games to be good. Skill was apparent by a quick viewing, and different skill is usually highly correlated. Good graphics meant good story – not true today. I was thinking about all the noobs showing up to comma. If you can put a technical barrier up to stop them, like it used to be. But you can’t. These barriers can’t be fake, because a fake barrier isn’t like a real barrier. A fake barrier is one small patch away from being gone. What if the Internet was a mistake? I feel like it’s breaking my brain. It was this mind expanding world in my childhood, but now it’s a set of narrow loops that are harder and harder to get out of. And you can’t escape it. Once you have Starlink to your phone, not having the Internet with you will be a choice, not a real barrier. There’s nowhere to hide. Chris McCandless wanted to be an explorer, but being born in 1968 meant that the world was already all explored. His clever solution, throw away the map. But that didn’t make him an explorer, it made him an idiot who died 5 miles from a bridge that would have saved his life. And I’ll tell you something else that you ain’t dying enough to know Big Casino Sure, you can still spin real records, code for the NES, and SSH into your comma device. But you don’t have to. And that makes the people who do it come from a different distribution from the people who used to. They are not explorers in the same way Chris McCandless wasn’t. When I found out about the singularity at 15, I was sure it was going to happen. It was depressing for a while, realizing that machines would be able to do everything a lot better than I could. But then I realized that it wasn’t like that yet and I could still work on this problem. And here I am, working in AI 20 years later. I thought I came to grips with obsolescence. But it’s not obsolescence, the reality is looking to be so much sadder than I imagined. It won’t be humans accepting the rise of the machines, it won’t be humans fighting the rise of the machines, it will be human shaped zoo animals oddly pacing back and forth in a corner of the cage while the world keeps turning around them. It’s easy to see the appeal of conspiracy theories. Even if they hate you, it’s more comforting to believe that they exist. That at least somebody is driving. But that’s not true. It’s just going. There are no longer Western institutions capable of making sense of the world. (maybe the Chinese ones can? it’s hard to tell) We are shoved up brutally against evolution, just of the memetic variety. The TikTok brainrot kids will be nothing compared to the ChatGPT brainrot kids. And I’m not talking like an old curmudgeon about the new forms of media being bad and the youth being bad like Socrates said. Because you can never go back. It will be whatever it is. To every fool preaching the end of history, evolution spits in your face. To every fool preaching the world government AI singleton, evolution spits in your face. I knew these things intellectually, but viscerally it’s just hard to live through. The world feels so small and I feel like I’m being stared at by the Eye of Sauron.

5 days ago 12 votes
The World After Wireheading

Hold my hand, grow my skin Erica Western Geiger Counter Do you have any addictions? You may not register them as such, perhaps because they don’t lead to anything you consider harmful consequences. But you have them. In some ways, all your behavior is compulsive. What would the alternative be? A point is, if we have something that we can predict this video Free will comes from the “veil of computability”, things look random until you find the pattern. I was at a bar last night and this girl told me you can’t predict humans, and the exact example she used was that it’s not like y = mx + b Oh, if only she knew. The dreams of my childhood have come true, studying machine learning has shown me how I work. I tried to explain that instead of 2 parameters it’s 100 trillion parameters, and it’s the slightly different y = relu(w@x) + b a bunch of times, you have to put some nonlinearities in there cause linear systems can only approximate a small class of functions. But this explanation was not heard at a bar. She was so confident she was right, and like I don’t even know where to start. Reader of this blog, do you know? AI is coming and we are so unbelievably unprepared. What is this garbage and this garbage. It’s nerd shit and political propaganda. The amount of power over nature that the Silicon Valley death cult is stumbling into is horrifying, and these high priests don’t have a basic grasp of people. No humanities education (perhaps the programs were gutted on purpose). Are we ready for the hypnodrones? How the fuck is targeted advertising legal and culturally okay? This will not stop until they take our free will from us. There’s a fire that burns today Better Nukes don’t end humanity. Current path AI doesn’t end humanity. It just ends all the machines and hands the world over to the street people. Now I see how the dark ages happened. If all the humans died today, all the machines would shortly follow. If all the machines died today, humanity would keep on going. Pay attention to this milestone. To date, machines are not robust, and evolution may be efficient at robust search. If it is, we get dark ages. If it’s not and we find a shortcut, God only knows.

a month ago 39 votes
Can tinygrad win?

This is not going to be a cakewalk like self driving cars. Most of comma’s competition is now out of business, taking billions and billions of dollars with it. Re: Tesla and FSD, we always expected Tesla to have the lead, but it’s not a winner take all market, it will look more like iOS vs Android. comma has been around for 10 years, is profitable, and is now growing rapidly. In self driving, most of the competition wasn’t even playing the right game. This isn’t how it is for ML frameworks. tinygrad’s competition is playing the right game, open source, and run by some quite smart people. But this is my second startup, so hopefully taking a bit more risk is appropriate. For comma to win, all it would take is people in 2016 being wrong about LIDAR, mapping, end to end, and hand coding, which hopefully we all agree now that they were. For tinygrad to win, it requires something much deeper to be wrong about software development in general. As it stands now, tinygrad is 14556 lines. Line count is not a perfect proxy for complexity, but when you have differences of multiple orders of magnitude, it might mean something. I asked ChatGPT to estimate the lines of code in PyTorch, JAX, and MLIR. JAX = 400k MLIR = 950k PyTorch = 3300k They range from one to two orders of magnitude off. And this isn’t even including all the libraries and drivers the other frameworks rely on, CUDA, cuBLAS, Triton, nccl, LLVM, etc…. tinygrad includes every single piece of code needed to drive an AMD RDNA3 GPU except for LLVM, and we plan to remove LLVM in a year or two as well. But so what? What does line count matter? One hypothesis is that tinygrad is only smaller because it’s not speed or feature competitive, and that if and when it becomes competitive, it will also be that many lines. But I just don’t think that’s true. tinygrad is already feature competitive, and for speed, I think the bitter lesson also applies to software. When you look at the machine learning ecosystem, you realize it’s just the same problems over and over again. The problem of multi machine, multi GPU, multi SM, multi ALU, cross machine memory scheduling, DRAM scheduling, SRAM scheduling, register scheduling, it’s all the same underlying problem at different scales. And yet, in all the current ecosystems, there are completely different codebases and libraries at each scale. I don’t think this stands. I suspect there is a simple formulation of the problem underlying all of the scheduling. Of course, this problem will be in NP and hard to optimize, but I’m betting the bitter lesson wins here. The goal of the tinygrad project is to abstract away everything except the absolute core problem in the cleanest way possible. This is why we need to replace everything. A model for the hardware is simple compared to a model for CUDA. If we succeed, tinygrad will not only be the fastest NN framework, but it will be under 25k lines all in, GPT-5 scale training job to MMIO on the PCIe bus! Here are the steps to get there: Expose the underlying search problem spanning several orders of magnitude. Due to the execution of neural networks not being data dependent, this problem is very amenable to search. Make sure your formulation is simple and complete. Fully capture all dimensions of the search space. The optimization goal is simple, run faster. Apply the state of the art in search. Burn compute. Use LLMs to guide. Use SAT solvers. Reinforcement learning. It doesn’t matter, there’s no way to cheat this goal. Just see if it runs faster. If this works, not only do we win with tinygrad, but hopefully people begin to rethink software in general. Of course, it’s a big if, this isn’t like comma where it was hard to lose. But if it wins… The main thing to watch is development speed. Our bet has to be that tinygrad’s development speed is outpacing the others. We have the AMD contract to train LLaMA 405B as fast as NVIDIA due in a year, let’s see if we succeed.

2 months ago 28 votes

More in programming

Apologies and forgiveness

The first in a series of posts about doing things the right way

5 hours ago 3 votes
Understanding Bazel remote caching

A deep dive into the Action Cache, the CAS, and the security issues that arise from using Bazel with a remote cache but without remote execution

6 hours ago 3 votes
Trying to Make Sense of Casing Conventions on the Web

(I present to you my stream of consciousness on the topic of casing as it applies to the web platform.) I’m reading about the new command and commandfor attributes — which I’m super excited about, declarative behavior invocation in HTML? YES PLEASE!! — and one thing that strikes me is the casing in these APIs. For example, the command attribute has a variety of values in HTML which correspond to APIs in JavaScript. The show-popover attribute value maps to .showPopover() in JavaScript. hide-popover maps to .hidePopover(), etc. So what we have is: lowercase in attribute names e.g. commandfor="..." kebab-case in attribute values e.g. show-popover camelCase for JS counterparts e.g. showPopover() After thinking about this a little more, I remember that HTML attributes names are case insensitive, so the browser will normalize them to lowercase during parsing. Given that, I suppose you could write commandFor="..." but it’s effectively the same. Ok, lowercase attribute names in HTML makes sense. The related popover attributes follow the same convention: popovertarget popovertargetaction And there are many other attribute names in HTML that are lowercase, e.g.: maxlength novalidate contenteditable autocomplete formenctype So that all makes sense. But wait, there are some attribute names with hyphens in them, like aria-label="..." and data-value="...". So why isn’t it command-for="..."? Well, upon further reflection, I suppose those attributes were named that way for extensibility’s sake: they are essentially wildcard attributes that represent a family of attributes that are all under the same namespace: aria-* and data-*. But wait, isn’t that an argument for doing popover-target and popover-target-action? Or command and command-for? But wait (I keep saying that) there are kebab-case attribute names in HTML — like http-equiv on the <meta> tag, or accept-charset on the form tag — but those seem more like legacy exceptions. It seems like the only answer here is: there is no rule. Naming is driven by convention and decisions are made on a case-by-case basis. But if I had to summarize, it would probably be that the default casing for new APIs tends to follow the rules I outlined at the start (and what’s reflected in the new command APIs): lowercase for HTML attributes names kebab-case for HTML attribute values camelCase for JS counterparts Let’s not even get into SVG attribute names We need one of those “bless this mess” signs that we can hang over the World Wide Web. Email · Mastodon · Bluesky

yesterday 5 votes
The Angels and Demons of Nondeterminism

Greetings everyone! You might have noticed that it's September and I don't have the next version of Logic for Programmers ready. As penance, here's ten free copies of the book. So a few months ago I wrote a newsletter about how we use nondeterminism in formal methods. The overarching idea: Nondeterminism is when multiple paths are possible from a starting state. A system preserves a property if it holds on all possible paths. If even one path violates the property, then we have a bug. An intuitive model of this is that for this is that when faced with a nondeterministic choice, the system always makes the worst possible choice. This is sometimes called demonic nondeterminism and is favored in formal methods because we are paranoid to a fault. The opposite would be angelic nondeterminism, where the system always makes the best possible choice. A property then holds if any possible path satisfies that property.1 This is not as common in FM, but it still has its uses! "Players can access the secret level" or "We can always shut down the computer" are reachability properties, that something is possible even if not actually done. In broader computer science research, I'd say that angelic nondeterminism is more popular, due to its widespread use in complexity analysis and programming languages. Complexity Analysis P is the set of all "decision problems" (basically, boolean functions) can be solved in polynomial time: there's an algorithm that's worst-case in O(n), O(n²), O(n³), etc.2 NP is the set of all problems that can be solved in polynomial time by an algorithm with angelic nondeterminism.3 For example, the question "does list l contain x" can be solved in O(1) time by a nondeterministic algorithm: fun is_member(l: List[T], x: T): bool { if l == [] {return false}; guess i in 0..<(len(l)-1); return l[i] == x; } Say call is_member([a, b, c, d], c). The best possible choice would be to guess i = 2, which would correctly return true. Now call is_member([a, b], d). No matter what we guess, the algorithm correctly returns false. and just return false. Ergo, O(1). NP stands for "Nondeterministic Polynomial". (And I just now realized something pretty cool: you can say that P is the set of all problems solvable in polynomial time under demonic nondeterminism, which is a nice parallel between the two classes.) Computer scientists have proven that angelic nondeterminism doesn't give us any more "power": there are no problems solvable with AN that aren't also solvable deterministically. The big question is whether AN is more efficient: it is widely believed, but not proven, that there are problems in NP but not in P. Most famously, "Is there any variable assignment that makes this boolean formula true?" A polynomial AN algorithm is again easy: fun SAT(f(x1, x2, …: bool): bool): bool { N = num_params(f) for i in 1..=num_params(f) { guess x_i in {true, false} } return f(x_1, x_2, …) } The best deterministic algorithms we have to solve the same problem are worst-case exponential with the number of boolean parameters. This a real frustrating problem because real computers don't have angelic nondeterminism, so problems like SAT remain hard. We can solve most "well-behaved" instances of the problem in reasonable time, but the worst-case instances get intractable real fast. Means of Abstraction We can directly turn an AN algorithm into a (possibly much slower) deterministic algorithm, such as by backtracking. This makes AN a pretty good abstraction over what an algorithm is doing. Does the regex (a+b)\1+ match "abaabaabaab"? Yes, if the regex engine nondeterministically guesses that it needs to start at the third letter and make the group aab. How does my PL's regex implementation find that match? I dunno, backtracking or NFA construction or something, I don't need to know the deterministic specifics in order to use the nondeterministic abstraction. Neel Krishnaswami has a great definition of 'declarative language': "any language with a semantics has some nontrivial existential quantifiers in it". I'm not sure if this is identical to saying "a language with an angelic nondeterministic abstraction", but they must be pretty close, and all of his examples match: SQL's selects and joins Parsing DSLs Logic programming's unification Constraint solving On top of that I'd add CSS selectors and planner's actions; all nondeterministic abstractions over a deterministic implementation. He also says that the things programmers hate most in declarative languages are features that "that expose the operational model": constraint solver search strategies, Prolog cuts, regex backreferences, etc. Which again matches my experiences with angelic nondeterminism: I dread features that force me to understand the deterministic implementation. But they're necessary, since P probably != NP and so we need to worry about operational optimizations. Eldritch Nondeterminism If you need to know the ratio of good/bad paths, the number of good paths, or probability, or anything more than "there is a good path" or "there is a bad path", you are beyond the reach of heaven or hell. Angelic and demonic nondeterminism are duals: angelic returns "yes" if some choice: correct and demonic returns "no" if !all choice: correct, which is the same as some choice: !correct. ↩ Pet peeve about Big-O notation: O(n²) is the set of all algorithms that, for sufficiently large problem sizes, grow no faster that quadratically. "Bubblesort has O(n²) complexity" should be written Bubblesort in O(n²), not Bubblesort = O(n²). ↩ To be precise, solvable in polynomial time by a Nondeterministic Turing Machine, a very particular model of computation. We can broadly talk about P and NP without framing everything in terms of Turing machines, but some details of complexity classes (like the existence "weak NP-hardness") kinda need Turing machines to make sense. ↩

yesterday 5 votes
Announcing the 2025 TokyoDev Developers Survey

The 2025 edition of the TokyoDev Developer Survey is now live! If you’re a software developer living in Japan, please take a few minutes to participate. All questions are optional, and it should take less than 10 minutes to complete. The survey will remain open until September 30th. Last year, we received over 800 responses. Highlights included: Median compensation remained stable. The pay gap between international and Japanese companies narrowed to 47%. Fewer respondents had the option to work fully remotely. For 2025, we’ve added several new questions, including a dedicated section on one of the most talked-about topics in development today: AI. The survey is completely anonymous, and only aggregated results will be shared—never personally identifiable information. The more responses we get, the deeper and more meaningful our insights will be. Please help by taking the survey and sharing it with your peers!

yesterday 8 votes