Logo:

from tonsky.me [alt+shift+b] in programming

Cyrillic version of Internet Explorer logo. Because it’s iconic.

3 months ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Comments

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from tonsky.me

We shouldn’t have needed lockfiles

Imagine you’re writing a project and need a library. Let’s call it libpupa. You look up its current version, which is 1.2.3, and add it to your dependencies: "libpupa": "1.2.3" In turn, the developer of libpupa, when writing its version 1.2.3, needed another library: liblupa. So they did the same thing: they looked up the version, which was 0.7.8 at the time, and added it to the dependencies of libpupa 1.2.3: "liblupa": "0.7.8" The version 0.7.8 of liblupa is immortalized forever in the dependencies of libpupa 1.2.3. No matter how many other versions of either liblupa or libpupa are released, libpupa 1.2.3 will always depend on liblupa 0.7.8. Our dependency resolution algorithm thus is like this: Get the top-level dependency versions Look up versions of libraries they depend on Look up versions of libraries they depend on ... The important point of this algorithm is that it’s fully deterministic. Given just the top-level dependencies, it will produce the entire dependency tree, identical every time. It’s also space-efficient: you don’t need to specify all the versions, just the top-level ones. Given libpupa 1.2.3, we will always arrive at liblupa 0.7.8. So why write it down in a separate file? And that’s it. End of story. Write down your top-level dependencies. The computer will figure out transitive ones. They are guaranteed to be the same, since everything is immutable. The sun is shining, the grass is green, and builds are fully reproducible. But people had to invent lockfiles. Imagine you voluntarily made your build non-reproducible by making them depend on time. If I build my app now, I get libpupa 1.2.3 and liblupa 0.7.8. If I repeat the same build in 10 minutes, I’ll get liblupa 0.7.9. Crazy, right? That would be chaos. But this is what version ranges essentially are. Instead of saying “libpupa 1.2.3 depends on liblupa 0.7.8”, they are saying “libpupa 1.2.3 depends on whatever the latest liblupa version is at the time of the build.” Notice that this is determined not at the time of publishing, but at the time of the build! If the author of libpupa has published 1.2.3 a year ago and I’m pulling it now, I might be using a liblupa version that didn’t even exist at the time of publishing! But... why would libpupa’s author write a version range that includes versions that don’t exist yet? How could they know that liblupa 0.7.9, whenever it will be released, will continue to work with libpupa? Surely they can’t see the future? Semantic versioning is a hint, but it has never been a guarantee. For that, kids, I have no good answer. The funny thing is, these version ranges end up not being used anyway. You lock your dependencies once in a lockfile and they stay there, unchanged. You don’t even get the good part! I guess, builds that depend on the calendar date are too crazy even for people who believe that referencing non-existing versions is fine. “But Niki, you can regenerate the lockfile and pull in all the new dependencies!” Sure. In exactly the same way you can update your top-level dependencies. “But Niki, lockfiles help resolve version conflicts!” In what way? Version conflicts don’t happen because of what’s written in dependency files. Your library might work with the newer dependency, or it might not. It doesn’t really depend on what the library’s author has guessed. Your library might have a version range of 0.7.*, work with 0.7.8, 0.7.9 but not with 0.7.10. Either way, the solution is the same: you have to pick the version that works. And the fact that someone somewhere long time ago wrote 0.7.* doesn’t really help you. “But Niki, if lockfiles exist, there must be a reason! People can’t be doing it for nothing!” You are new in IT, I see. People absolutely can and do things here for no good reason all the time. But if you want an existence proof: Maven. The Java library ecosystem has been going strong for 20 years, and during that time not once have we needed a lockfile. And we are pulling hundreds of libraries just to log two lines of text, so it is actively used at scale. In conclusion, lockfiles are an absolutely unnecessary concept that complicates things without a good reason. Dependency managers can and are working without it just the same.

a month ago • 29 votes

Gaslight-driven development

Any person who has used a computer in the past ten years knows that doing meaningless tasks is just part of the experience. Millions of people create accounts, confirm emails, dismiss notifications, solve captchas, reject cookies, and accept terms and conditions—not because they particularly want to or even need to. They do it because that’s what the computer told them to do. Like it or not, we are already serving the machines. Well, now there is a new way to serve our silicon overlords. LLMs started to have opinions on how your API should look, and since 90% of all code will be written by AI comes September, we have no choice but to oblige. You might’ve heard a story of Soundslice adding a feature because ChatGPT kept telling people it exists. We see the same at Instant: for example, we used tx.update for both inserting and updating entities, but LLMs kept writing tx.create instead. Guess what: we now have tx.create, too. Is it good or is it bad? It definitely feels strange. In a sense, it’s helpful: LLMs here have seen millions of other APIs and are suggesting the most obvious thing, something every developer would think of first, too. It’s also a unique testing device: if developers use your API wrong, they blame themselves, read the documentation, and fix their code. In the end, you might never learn that they even had the problem. But with ChatGPT, you yourself can experience “newbie’s POV” at any time. Of course, this approach doesn’t work if you are trying to do something new and unique. LLMs just won’t “get it”. But how many of us are doing something new and unique? Maybe, API is not the place to get clever? Maybe, for most cases, it’s truly best if you did the most obvious thing? So welcome to the new era. AI is not just using tools we gave it. It now has opinions about how these tools should’ve been made. And instead of asking nicely, it gaslights everybody into thinking that’s how it’s always been.

a month ago • 38 votes

Podcast: Datomic: самая рок-н-рольная БД @ Тысяча фичей

Чем Datomic отличается от других баз данных и почему иногда остутствие оптимизатора лучше, чем его присутствие

2 months ago • 29 votes

Talk: Local-first is not going to win, but that’s okay @ Local-First Conf

We’ll explore the complexities of traditional stack (db-server-frontend), develop a theory of software evolution: which systems succeed and why. Then we’ll see how local-first fits into it and which local-first-adjacent practices are making software development easier and therefore are doomed for success (or not?)

3 months ago • 20 votes

More in programming

If Apple cared about privacy

Defaults matter

9 hours ago • 4 votes

ARM is great, ARM is terrible (and so is RISC-V)

I’ve long been interested in new and different platforms. I ran Debian on an Alpha back in the late 1990s and was part of the Alpha port team; then I helped bootstrap Debian on amd64. I’ve got somewhere around 8 Raspberry Pi devices in active use right now, and the free NNCPNET Internet email service … Continue reading ARM is great, ARM is terrible (and so is RISC-V) →

7 hours ago • 2 votes

Many Hard Leetcode Problems are Easy Constraint Problems

In my first interview out of college I was asked the change counter problem: Given a set of coin denominations, find the minimum number of coins required to make change for a given number. IE for USA coinage and 37 cents, the minimum number is four (quarter, dime, 2 pennies). I implemented the simple greedy algorithm and immediately fell into the trap of the question: the greedy algorithm only works for "well-behaved" denominations. If the coin values were [10, 9, 1], then making 37 cents would take 10 coins in the greedy algorithm but only 4 coins optimally (10+9+9+9). The "smart" answer is to use a dynamic programming algorithm, which I didn't know how to do. So I failed the interview. But you only need dynamic programming if you're writing your own algorithm. It's really easy if you throw it into a constraint solver like MiniZinc and call it a day. int: total; array[int] of int: values = [10, 9, 1]; array[index_set(values)] of var 0..: coins; constraint sum (c in index_set(coins)) (coins[c] * values[c]) == total; solve minimize sum(coins); You can try this online here. It'll give you a prompt to put in total and then give you successively-better solutions: coins = [0, 0, 37]; ---------- coins = [0, 1, 28]; ---------- coins = [0, 2, 19]; ---------- coins = [0, 3, 10]; ---------- coins = [0, 4, 1]; ---------- coins = [1, 3, 0]; ---------- Lots of similar interview questions are this kind of mathematical optimization problem, where we have to find the maximum or minimum of a function corresponding to constraints. They're hard in programming languages because programming languages are too low-level. They are also exactly the problems that constraint solvers were designed to solve. Hard leetcode problems are easy constraint problems.1 Here I'm using MiniZinc, but you could just as easily use Z3 or OR-Tools or whatever your favorite generalized solver is. More examples This was a question in a different interview (which I thankfully passed): Given a list of stock prices through the day, find maximum profit you can get by buying one stock and selling one stock later. It's easy to do in O(n^2) time, or if you are clever, you can do it in O(n). Or you could be not clever at all and just write it as a constraint problem: array[int] of int: prices = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; var int: buy; var int: sell; var int: profit = prices[sell] - prices[buy]; constraint sell > buy; constraint profit > 0; solve maximize profit; Reminder, link to trying it online here. While working at that job, one interview question we tested out was: Given a list, determine if three numbers in that list can be added or subtracted to give 0? This is a satisfaction problem, not a constraint problem: we don't need the "best answer", any answer will do. We eventually decided against it for being too tricky for the engineers we were targeting. But it's not tricky in a solver; include "globals.mzn"; array[int] of int: numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; array[index_set(numbers)] of var {0, -1, 1}: choices; constraint sum(n in index_set(numbers)) (numbers[n] * choices[n]) = 0; constraint count(choices, -1) + count(choices, 1) = 3; solve satisfy; Okay, one last one, a problem I saw last year at Chipy AlgoSIG. Basically they pick some leetcode problems and we all do them. I failed to solve this one: Given an array of integers heights representing the histogram's bar height where the width of each bar is 1, return the area of the largest rectangle in the histogram. The "proper" solution is a tricky thing involving tracking lots of bookkeeping states, which you can completely bypass by expressing it as constraints: array[int] of int: numbers = [2,1,5,6,2,3]; var 1..length(numbers): x; var 1..length(numbers): dx; var 1..: y; constraint x + dx <= length(numbers); constraint forall (i in x..(x+dx)) (y <= numbers[i]); var int: area = (dx+1)*y; solve maximize area; output ["(\(x)->\(x+dx))*\(y) = \(area)"] There's even a way to automatically visualize the solution (using vis_geost_2d), but I didn't feel like figuring it out in time for the newsletter. Is this better? Now if I actually brought these questions to an interview the interviewee could ruin my day by asking "what's the runtime complexity?" Constraint solvers runtimes are unpredictable and almost always than an ideal bespoke algorithm because they are more expressive, in what I refer to as the capability/tractability tradeoff. But even so, they'll do way better than a bad bespoke algorithm, and I'm not experienced enough in handwriting algorithms to consistently beat a solver. The real advantage of solvers, though, is how well they handle new constraints. Take the stock picking problem above. I can write an O(n²) algorithm in a few minutes and the O(n) algorithm if you give me some time to think. Now change the problem to Maximize the profit by buying and selling up to max_sales stocks, but you can only buy or sell one stock at a given time and you can only hold up to max_hold stocks at a time? That's a way harder problem to write even an inefficient algorithm for! While the constraint problem is only a tiny bit more complicated: include "globals.mzn"; int: max_sales = 3; int: max_hold = 2; array[int] of int: prices = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; array [1..max_sales] of var int: buy; array [1..max_sales] of var int: sell; array [index_set(prices)] of var 0..max_hold: stocks_held; var int: profit = sum(s in 1..max_sales) (prices[sell[s]] - prices[buy[s]]); constraint forall (s in 1..max_sales) (sell[s] > buy[s]); constraint profit > 0; constraint forall(i in index_set(prices)) (stocks_held[i] = (count(s in 1..max_sales) (buy[s] <= i) - count(s in 1..max_sales) (sell[s] <= i))); constraint alldifferent(buy ++ sell); solve maximize profit; output ["buy at \(buy)\n", "sell at \(sell)\n", "for \(profit)"]; Most constraint solving examples online are puzzles, like Sudoku or "SEND + MORE = MONEY". Solving leetcode problems would be a more interesting demonstration. And you get more interesting opportunities to teach optimizations, like symmetry breaking. Because my dad will email me if I don't explain this: "leetcode" is slang for "tricky algorithmic interview questions that have little-to-no relevance in the actual job you're interviewing for." It's from leetcode.com. ↩

7 hours ago • 2 votes

btrfs on a Raspberry Pi

I’m something of a filesystem geek, I guess. I first wrote about ZFS on Linux 14 years ago, and even before I used ZFS, I had used ext2/3/4, jfs, reiserfs, xfs, and no doubt some others. I’ve also used btrfs. I last posted about it in 2014, when I noted it has some advantages over … Continue reading btrfs on a Raspberry Pi →

yesterday • 3 votes

Stumbling upon

Something like a channel changer, for the web. That's what the idea was at first. But it led to a whole new path of discovery that even the site's creators couldn't have predicted. The post Stumbling upon appeared first on The History of the Web.

yesterday • 7 votes

New here?

Logo:

Comments

Improve your reading experience

More from tonsky.me

More in programming

bored reading