The Pull Request

from Tyler Cipriani: blog [alt+shift+b] in programming

A brief and biased history. Oh yeah, there’s pull requests now – GitHub blog, Sat, 23 Feb 2008 When GitHub launched, it had no code review. Three years after launch, in 2011, GitHub user rtomayko became the first person to make a real code comment, which read, in full: “+1”. Before that, GitHub lacked any way to comment on code directly. Instead, pull requests were a combination of two simple features: Cross repository compare view – a feature they’d debuted in 2010—git diff in a web page. A comments section – a feature most blogs had in the 90s. There was no way to thread comments, and the comments were on a different page than the diff. GitHub pull requests circa 2010. This is from the official documentation on GitHub. Earlier still, when the pull request debuted, GitHub claimed only that pull requests were “a way to poke someone about code”—a way to direct message maintainers, but one that lacked any web view of the code whatsoever. For developers, it worked like this: Make a...

11 months ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Tyler Cipriani: blog

Digging into git commit templates

Any code of your own that you haven’t looked at for six or more months might as well have been written by someone else. – Eagleson’s Law After scouring git history, I found the correct config file, but someone removed it. Their full commit message read: Remove config. Don't bring it back. Very. helpful. But I get it; it’s hard to care about commit messages when you’re making a quick change. Git commit templates can help. Commit templates provide a scaffold for your commit messages, reminding you to answer questions like: What problem are you solving? Why is this the solution? What alternatives did you consider? Where can I read more? What is a git commit template? When you type git commit, git pops open your text editor1. Git can pre-fill your editor with a commit template—a form that reminds you of everything it’s easy to forget when writing a commit. Creating a commit template is simple. Create a plaintext file – mine lives at ~/.config/git/message.txt Tell git to use it: git config --global \ commit.template '~/.config/git/message.txt' My template packs everything I know about writing a commit. Project-specific templates Large projects, such as Linux kernel, git, and MediaWiki, have their own commit guidelines. Git templates can remind you about these per-project requirements if you add a commit template to a project’s .git/config file. Another way to do this is git’s includeIf configuration setting. includeIf lets you override git config settings when you’re working under directories you define. For example, all my Wikimedia work lives in ~/Projects/Wikimedia and at the bottom of my ~/.config/git/config I have: [includeIf "gitdir:~/Projects/Wikimedia/**"] path = ~/.config/git/config.wikimedia In config.wikimedia, I point to my Wikimedia-specific commit template (along with other necessary git settings: my user.email, core.hooksPath, and a pushInsteadOf url to push to ssh even when I clone via https). Forge-specific templates Personal git commit templates lead to better commits, which make for a better history. The forge-specific pull-request templates are a band-aid, the cheap kind that falls off in the shower. There’s no incentive for GitHub to make git history better: the worse your commit history, the more you rely on GitHub. Still, all the major pull-request-style forges let you foist a pull-request template on your contributors. As a contributor, I dislike filling those out—they add unnecessary friction. Commit message contents Your commit template allows you do the hard thinking upfront. Then, when you make a commit, you simply follow the template. My template asks questions I answer with my commit message: 72ch. wide -------------------------------------------------------- BODY # | # - Why should this change be made? | # - What problem are you solving? | # - Why this solution? | # - What's wrong with the current code? | # - Are there other ways to do it? | # - How can the reviewer confirm it works? | # | # ---------------------------------------------------------------- /BODY But other clever folks cooked up conventions you could incorporate: Conventional commits – how do your commits relate to semantic versioning? This makes it easier for SRE and downstream users. Problem/Solution format – first pioneered by ZeroMQ2, this format anticipates the questions of future developers and reviewers. Gitmoji – developed for the GitHub crowd, this format defines an emoji shorthand that makes it easy to spot changes of a particular type. Commit message formatting How you format text affects how people read it. My template also deals with text formatting rules3: Subject – 50 characters or less, capitalized, no end punctuation. Body – Wrap at 72 characters with a blank line separating it from the subject. Trailers – Standard formats with a blank line separating them from the body. People will read your commit in different contexts: git log, git shortlog, and git rebase. But git’s pager has no line wrapping by default. I hard wrap at 72 characters because that makes text easier to read in wide terminals.4 Finally, my template addresses trailers, reminding me about standard trailers supported in the projects I’m working on. Git can interpret trailers, which can be useful later. For example, if I wanted a tab-separated list of commits and their related tasks I could find that with git log: $ TAB=%x09 $ BUG_TRAILER='%(trailers:key=Bug,valueonly=true,separator=%x2C )' $ SHORT_HASH=%h $ SUBJ=%s $ FORMAT="${SHORT_HASH}${TAB}${BUG_TRAILER}${TAB}${GIT_SUBJ}" $ git log --topo-order --no-merges \ --format="$FORMAT" d2b09deb12f T359762 Rewrite Kurdish (ku) Latin to Arabic converter 28123a6a262 T332865 tests: Remove non-static fallback in HookRunnerTestBase 4e919a307a4 T328919 tests: Remove unused argument from data provider in PageUpdaterTest bedd0f685f9 objectcache: Improve `RESTBagOStuff::handleError()` 2182a0c4490 T393219 tests: Remove two data provider in RestStructureTest Git commit templates free your brain from remembering what you should write, allowing you to focus on the story you should tell. Your future self will thank you for the effort. Starting with core.editor in your git config, $VISUAL or $EDITOR in your shell, finally falling back to vi.↩︎ I think…↩︎ All cribbed from Tim Pope↩︎ Another story I’ve heard: a standard terminal allows 80 characters per line. git log indents commit messages with 4 spaces. A 72-character-per-line commit centers text on an 80-character-per-line terminal. To me, readability in modern terminals is a better reason to wrap than kowtowing to antiquated terminals.↩︎

2 months ago • 34 votes

Boox Go 10.3, two months in

[The] Linux kernel uses GPLv2, and if you distribute GPLv2 code, you have to provide a copy of the source (and modifications) once someone asks for it. And now I’m asking nicely for you to do so 🙂 – Joga, bbs.onyx-international.com Boox in split screen, typewriter mode In January, I bought a Boox Go 10.3—a 10.3-inch, 300-ppi, e-ink Android tablet. After two months, I use the Boox daily—it’s replaced my planner, notebook, countless PDF print-offs, and the good parts of my phone. But Boox’s parent company, Onyx, is sketchy. I’m conflicted. The Boox Go is a beautiful, capable tablet that I use every day, but I recommend avoiding as long as Onyx continues to disregard the rights of its users. How I’m using my Boox My e-ink floor desk Each morning, I plop down in front of my MagicHold laptop stand and journal on my Boox with Obsidian. I use Syncthing to back up my planner and sync my Zotero library between my Boox and laptop. In the evening, I review my PDF planner and plot for tomorrow. I use these apps: Obsidian – a markdown editor that syncs between all my devices with no fuss for $8/mo. Syncthing – I love Syncthing—it’s an encrypted, continuous file sync-er without a centralized server. Meditation apps1 – Guided meditation away from the blue light glow of my phone or computer is better. Before buying the Boox, I considered a reMarkable. The reMarkable Paper Pro has a beautiful color screen with a frontlight, a nice pen, and a “type folio,” plus it’s certified by the Calm Tech Institute. But the reMarkable is a distraction-free e-ink tablet. Meanwhile, I need distraction-lite. What I like Calm(ish) technology – The Boox is an intentional device. Browsing the internet, reading emails, and watching videos is hard, but that’s good. Apps – Google Play works out of the box. I can install F-Droid and change my launcher without difficulty. Split screen – The built-in launcher has a split screen feature. I use it to open a PDF side-by-side with a notes doc. Reading – The screen is a 300ppi Carta 1200, making text crisp and clear. What I dislike I filmed myself typing at 240fps, each frame is 4.17ms. Boox’s typing latency is between 150ms and 275ms at the fastest refresh rate inside Obsidian. Typing – Typing latency is noticeable. At Boox’s highest refresh rate, after hitting a key, text takes between 150ms to 275ms to appear. I can still type, though it’s distracting at times. The horror of the default pen Accessories Pen – The default pen looks like a child’s whiteboard marker and feels cheap. I replaced it with the Kindle Scribe Premium pen, and the writing experience is vastly improved. Cover – It’s impossible to find a nice cover. I’m using a $15 cover that I’m encasing in stickers. Tool switching – Swapping between apps is slow and clunky. I blame Android and the current limitations of e-ink more than Boox. No frontlight – The Boox’s lack of frontlight prevents me from reading more with it. I knew this when I bought my Boox, but devices with frontlights seem to make other compromises. Onyx The Chinese company behind Boox, Onyx International, Inc., runs the servers where Boox shuttles tracking information. I block this traffic with Pi-Hole2. pihole-ing whatever telemetry Boox collects I inspected this traffic via Mitm proxy—most traffic was benign, though I never opted into sending any telemetry (nor am I logged in to a Boox account). But it’s also an Android device, so it’s feeding telemetry into Google’s gaping maw, too. Worse, Onyx is flouting the terms of the GNU Public License, declining to release Linux kernel modifications to users. This is anathema to me—GPL violations are tantamount to theft. Onyx’s disregard for user rights makes me regret buying the Boox. Verdict I’ll continue to use the Boox and feel bad about it. I hope my digging in this post will help the next person. Unfortunately, the e-ink tablet market is too niche to support the kind of solarpunk future I’d always imagined. But there’s an opportunity for an open, Linux-based tablet to dominate e-ink. Linux is playing catch-up on phones with PostmarketOS. Meanwhile, the best e-ink tablets have to offer are old, unupdateable versions of Android, like the OS on the Boox. In the future, I’d love to pay a license- and privacy-respecting company for beautiful, calm technology and recommend their product to everyone. But today is not the future. I go back and forth between “Waking Up” and “Calm”↩︎ Using github.com/JordanEJ/Onyx-Boox-Blocklist↩︎

5 months ago • 31 votes

Eventually consistent plain text accounting

.title { text-wrap: balance } Spending for October, generated by piping hledger → R Over the past six months, I’ve tracked my money with hledger—a plain text double-entry accounting system written in Haskell. It’s been surprisingly painless. My previous attempts to pick up real accounting tools floundered. Hosted tools are privacy nightmares, and my stint with GnuCash didn’t last. But after stumbling on Dmitry Astapov’s “Full-fledged hledger” wiki1, it clicked—eventually consistent accounting. Instead of modeling your money all at once, take it one hacking session at a time. It should be easy to work towards eventual consistency. […] I should be able to [add financial records] bit by little bit, leaving things half-done, and picking them up later with little (mental) effort. – Dmitry Astapov, Full-Fledged Hledger Principles of my system I’ve cobbled together a system based on these principles: Avoid manual entry – Avoid typing in each transaction. Instead, rely on CSVs from the bank. CSVs as truth – CSVs are the only things that matter. Everything else can be blown away and rebuilt anytime. Embrace version control – Keep everything under version control in Git for easy comparison and safe experimentation. Learn hledger in five minutes hledger concepts are heady, but its use is simple. I divide the core concepts into two categories: Stuff hledger cares about: Transactions – how hledger moves money between accounts. Journal files – files full of transactions Stuff I care about: Rules files – how I set up accounts, import CSVs, and move money between accounts. Reports – help me see where my money is going and if I messed up my rules. Transactions move money between accounts: 2024-01-01 Payday income:work $-100.00 assets:checking $100.00 This transaction shows that on Jan 1, 2024, money moved from income:work into assets:checking—Payday. The sum of each transaction should be $0. Money comes from somewhere, and the same amount goes somewhere else—double-entry accounting. This is powerful technology—it makes mistakes impossible to ignore. Journal files are text files containing one or more transactions: 2024-01-01 Payday income:work $-100.00 assets:checking $100.00 2024-01-02 QUANSHENG UVK5 assets:checking $-29.34 expenses:fun:radio $29.34 Rules files transform CSVs into journal files via regex matching. Here’s a CSV from my bank: Transaction Date,Description,Category,Type,Amount,Memo 09/01/2024,DEPOSIT Paycheck,Payment,Payment,1000.00, 09/04/2024,PizzaPals Pizza,Food & Drink,Sale,-42.31, 09/03/2024,Amazon.com*XXXXXXXXY,Shopping,Sale,-35.56, 09/03/2024,OBSIDIAN.MD,Shopping,Sale,-10.00, 09/02/2024,Amazon web services,Personal,Sale,-17.89, And here’s a checking.rules to transform that CSV into a journal file so I can use it with hledger: # checking.rules # -------------- # Map CSV fields → hledger fields[0] fields date,description,category,type,amount,memo,_ # `account1`: the account for the whole CSV.[1] account1 assets:checking account2 expenses:unknown skip 1 date-format %m/%d/%Y currency $ if %type Payment account2 income:unknown if %category Food & Drink account2 expenses:food:dining # [0]: <https://hledger.org/hledger.html#field-names> # [1]: <https://hledger.org/hledger.html#account-field> With these two files (checking.rules and 2024-09_checking.csv), I can make the CSV into a journal: $ > 2024-09_checking.journal \ hledger print \ --rules-file checking.rules \ -f 2024-09_checking.csv $ head 2024-09_checking.journal 2024-09-01 DEPOSIT Paycheck assets:checking $1000.00 income:unknown $-1000.00 2024-09-02 Amazon web services assets:checking $-17.89 expenses:unknown $17.89 Reports are interesting ways to view transactions between accounts. There are registers, balance sheets, and income statements: $ hledger incomestatement \ --depth=2 \ --file=2024-09_bank.journal Revenues: $1000.00 income:unknown ----------------------- $1000.00 Expenses: $42.31 expenses:food $63.45 expenses:unknown ----------------------- $105.76 ----------------------- Net: $894.24 At the beginning of September, I spent $105.76 and made $1000, leaving me with $894.24. But a good chunk is going to the default expense account, expenses:unknown. I can use the hleger aregister to see what those transactions are: $ hledger areg expenses:unknown \ --file=2024-09_checking.journal \ -O csv | \ csvcut -c description,change | \ csvlook | description | change | | ------------------------ | ------ | | OBSIDIAN.MD | 10.00 | | Amazon web services | 17.89 | | Amazon.com*XXXXXXXXY | 35.56 | l Then, I can add some more rules to my checking.rules: if OBSIDIAN.MD account2 expenses:personal:subscriptions if Amazon web services account2 expenses:personal:web:hosting if Amazon.com account2 expenses:personal:shopping:amazon Now, I can reprocess my data to get a better picture of my spending: $ > 2024-09_bank.journal \ hledger print \ --rules-file bank.rules \ -f 2024-09_bank.csv $ hledger bal expenses \ --depth=3 \ --percent \ -f 2024-09_checking2.journal 30.0 % expenses:food:dining 33.6 % expenses:personal:shopping 9.5 % expenses:personal:subscriptions 16.9 % expenses:personal:web -------------------- 100.0 % For the Amazon.com purchase, I lumped it into the expenses:personal:shopping account. But I could dig deeper—download my order history from Amazon and categorize that spending. This is the power of working bit-by-bit—the data guides you to the next, deeper rabbit hole. Goals and non-goals Why am I doing this? For years, I maintained a monthly spreadsheet of account balances. I had a balance sheet. But I still had questions. Spending over six months, generated by piping hledger → gnuplot Before diving into accounting software, these were my goals: Granular understanding of my spending – The big one. This is where my monthly spreadsheet fell short. I knew I had money in the bank—I kept my monthly balance sheet. I budgeted up-front the % of my income I was saving. But I had no idea where my other money was going. Data privacy – I’m unwilling to hand the keys to my accounts to YNAB or Mint. Increased value over time – The more time I put in, the more value I want to get out—this is what you get from professional tools built for nerds. While I wished for low-effort setup, I wanted the tool to be able to grow to more uses over time. Non-goals—these are the parts I never cared about: Investment tracking – For now, I left this out of scope. Between monthly balances in my spreadsheet and online investing tools’ ability to drill down, I was fine.2 Taxes – Folks smarter than me help me understand my yearly taxes.3 Shared system – I may want to share reports from this system, but no one will have to work in it except me. Cash – Cash transactions are unimportant to me. I withdraw money from the ATM sometimes. It evaporates. hledger can track all these things. My setup is flexible enough to support them someday. But that’s unimportant to me right now. Monthly maintenance I spend about an hour a month checking in on my money Which frees me to spend time making fancy charts—an activity I perversely enjoy. Income vs. Expense, generated by piping hledger → gnuplot Here’s my setup: $ tree ~/Documents/ledger . ├── export │ ├── 2024-balance-sheet.txt │ └── 2024-income-statement.txt ├── import │ ├── in │ │ ├── amazon │ │ │ └── order-history.csv │ │ ├── credit │ │ │ ├── 2024-01-01_2024-02-01.csv │ │ │ ├── ... │ │ │ └── 2024-10-01_2024-11-01.csv │ │ └── debit │ │ ├── 2024-01-01_2024-02-01.csv │ │ ├── ... │ │ └── 2024-10-01_2024-11-01.csv │ └── journal │ ├── amazon │ │ └── order-history.journal │ ├── credit │ │ ├── 2024-01-01_2024-02-01.journal │ │ ├── ... │ │ └── 2024-10-01_2024-11-01.journal │ └── debit │ ├── 2024-01-01_2024-02-01.journal │ ├── ... │ └── 2024-10-01_2024-11-01.journal ├── rules │ ├── amazon │ │ └── journal.rules │ ├── credit │ │ └── journal.rules │ ├── debit │ │ └── journal.rules │ └── common.rules ├── 2024.journal ├── Makefile └── README Process: Import – download a CSV for the month from each account and plop it into import/in/<account>/<dates>.csv Make – run make Squint – Look at git diff; if it looks good, git add . && git commit -m "💸" otherwise review hledger areg to see details. The Makefile generates everything under import/journal: journal files from my CSVs using their corresponding rules. reports in the export folder I include all the journal files in the 2024.journal with the line: include ./import/journal/*/*.journal Here’s the Makefile: SHELL := /bin/bash RAW_CSV = $(wildcard import/in/**/*.csv) JOURNALS = $(foreach file,$(RAW_CSV),$(subst /in/,/journal/,$(patsubst %.csv,%.journal,$(file)))) .PHONY: all all: $(JOURNALS) hledger is -f 2024.journal > export/2024-income-statement.txt hledger bs -f 2024.journal > export/2024-balance-sheet.txt .PHONY clean clean: rm -rf import/journal/**/*.journal import/journal/%.journal: import/in/%.csv @echo "Processing csv $< to $@" @echo "---" @mkdir -p $(shell dirname $@) @hledger print --rules-file rules/$(shell basename $$(dirname $<))/journal.rules -f "$<" > "$@" If I find anything amiss (e.g., if my balances are different than what the bank tells me), I look at hleger areg. I may tweak my rules or my CSVs and then I run make clean && make and try again. Simple, plain text accounting made simple. And if I ever want to dig deeper, hledger’s docs have more to teach. But for now, the balance of effort vs. reward is perfect. while reading a blog post from Jonathan Dowland↩︎ Note, this is covered by full-fledged hledger – Investements↩︎ Also covered in full-fledged hledger – Tax returns↩︎

9 months ago • 55 votes

Subliminal git commits

Luckily, I speak Leet. – Amita Ramanujan, Numb3rs, CBS’s IRC Drama There’s an episode of the CBS prime-time drama Numb3rs that plumbs the depths of Dr. Joel Fleischman’s1 knowledge of IRC. In one scene, Fleischman wonders, “What’s ‘leet’”? “Leet” is writing that replaces letters with numbers, e.g., “Numb3rs,” where 3 stands in for e. In short, leet is like the heavy-metal “S” you drew in middle school: Sweeeeet. / \ / | \ | | | \ \ | | | \ | / \ / ASCII art version of your misspent youth. Following years of keen observation, I’ve noticed Git commit hashes are also letters and numbers. Git commit hashes are, as Fleischman might say, prime targets for l33tification. What can I spell with a git commit? DenITDao via orlybooks) With hexidecimal we can spell any word containing the set of letters {A, B, C, D, E, F}—DEADBEEF (a classic) or ABBABABE (for Mama Mia aficionados). This is because hexidecimal is a base-16 numbering system—a single “digit” represents 16 numbers: Base-10: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 16 15 Base-16: 0 1 2 3 4 5 6 7 8 9 A B C D E F Leet expands our palette of words—using 0, 1, and 5 to represent O, I, and S, respectively. I created a script that scours a few word lists for valid words and phrases. With it, I found masterpieces like DADB0D (dad bod), BADA55 (bad ass), and 5ADBAB1E5 (sad babies). Manipulating commit hashes for fun and no profit Git commit hashes are no mystery. A commit hash is the SHA-1 of a commit object. And a commit object is the commit message with some metadata. $ mkdir /tmp/BADA55-git && cd /tmp/BAD55-git $ git init Initialized empty Git repository in /tmp/BADA55-git/.git/ $ echo '# BADA55 git repo' > README.md && git add README.md && git commit -m 'Initial commit' [main (root-commit) 68ec0dd] Initial commit 1 file changed, 1 insertion(+) create mode 100644 README.md $ git log --oneline 68ec0dd (HEAD -> main) Initial commit Let’s confirm we can recreate the commit hash: $ git cat-file -p 68ec0dd > commit-msg $ sha1sum <(cat \ <(printf "commit ") \ <(wc -c < commit-msg | tr -d '\n') \ <(printf '%b' '\0') commit-msg) 68ec0dd6dead532f18082b72beeb73bd828ee8fc /dev/fd/63 Our repo’s first commit has the hash 68ec0dd. My goal is: Make 68ec0dd be BADA55. Keep the commit message the same, visibly at least. But I’ll need to change the commit to change the hash. To keep those changes invisible in the output of git log, I’ll add a \t and see what happens to the hash. $ truncate -s -1 commit-msg # remove final newline $ printf '\t\n' >> commit-msg # Add a tab $ # Check the new SHA to see if it's BADA55 $ sha1sum <(cat \ <(printf "commit ") \ <(wc -c < commit-msg | tr -d '\n') \ <(printf '%b' '\0') commit-msg) 27b22ba5e1c837a34329891c15408208a944aa24 /dev/fd/63 Success! I changed the SHA-1. Now to do this until we get to BADA55. Fortunately, user not-an-aardvark created a tool for that—lucky-commit that manipulates a commit message, adding a combination of \t and [:space:] characters until you hit a desired SHA-1. Written in rust, lucky-commit computes all 256 unique 8-bit strings composed of only tabs and spaces. And then pads out commits up to 48-bits with those strings, using worker threads to quickly compute the SHA-12 of each commit. It’s pretty fast: $ time lucky_commit BADA555 real 0m0.091s user 0m0.653s sys 0m0.007s $ git log --oneline bada555 (HEAD -> main) Initial commit $ xxd -c1 <(git cat-file -p 68ec0dd) | grep -cPo ': (20|09)' 12 $ xxd -c1 <(git cat-file -p HEAD) | grep -cPo ': (20|09)' 111 Now we have an more than an initial commit. We have a BADA555 initial commit. All that’s left to do is to make ALL our commits BADA55 by abusing git hooks. $ cat > .git/hooks/post-commit && chmod +x .git/hooks/post-commit #!/usr/bin/env bash echo 'L337-ifying!' lucky_commit BADA55 $ echo 'A repo that is very l33t.' >> README.md && git commit -a -m 'l33t' L337-ifying! [main 0e00cb2] l33t 1 file changed, 1 insertion(+) $ git log --oneline bada552 (HEAD -> main) l33t bada555 Initial commit And now I have a git repo almost as cool as the sweet “S” I drew in middle school. This is a Northern Exposure spin off, right? I’ve only seen 1:48 of the show…↩︎ or SHA-256 for repos that have made the jump to a more secure hash function↩︎

10 months ago • 74 votes

More in programming

It's beginning to feel like the 80s in America again

Have I told you how much I've come to dislike the 90s? The depressive music, the ironic distance to everything, the deconstructive narratives, the moral relativism, and the total cultural takeover of postmodern ideology. Oh, I did that just last week? Well, allow me another go. But rather than railing against the 90s, let me tell you about the 80s. They were amazing. America was firing on all cylinders, Reagan had brought the morning back, and the Soviet Union provided a clear black-and-white adversarial image. But it was the popular culture of the era that still fills me with hiraeth. It was the time of earnest storytelling. When Rocky could just train real hard to avenge the death of his friend by punching Dolph Lundgreen for 10 minutes straight in a montage of blood, sweat, and tears. After which even the Russians couldn't help themselves but cheer for him. Not a shred of irony. Just pure "if you work real hard, you can do the right thing" energy. Or what about Weird Science from 1985? Two nerds bring Barbie to life, and she teaches them to talk to real girls. It was goofy, it was kitsch, but it was also earnest and honest. Nerdy teenage boys have a hard time talking to girls! But they can learn how, and if they do, it'll all work out. That movie was actually the earliest memory I have of wanting to move to America. I don't remember exactly when I saw it, but I remember at the end thinking, "I have to go there." Such was the magnetic power of that American 80s earnest optimism! Or what about the music? Do you have any idea what the 1986 music video for Sabrina's BOYS could do for a young Danish boy who'd only just discovered the appeal of girls? Wonders, is what. Wonders. And again, it depicted this goofy but earnest energy. Boys and girls like each other! They have fun with the chase. The genders are not doomed to opposing trenches on the Eastern Front for all eternity. So back to today. It feels like we're finally emerging from this constant 90s Seattle drizzle to sunny 80s LA vibes in America. The constant pessimism, the cancellation militias, and the walking-on-eggshells atmosphere have given way to something far brighter, bolder, and, yes, better. An optimism, a levity, a confidence. First, America has Sweeney fever. The Sydney Sweeney Has Great Jeans campaign has been dominating the discourse for weeks, but rather than back down with a groveling apology, American Eagle has doubled down with a statement and now the Las Vegas sphere! And Sweeney herself has also kept her foot on the gas. Now there's a new Baskin-Robbins campaign that's equal parts Weird Science (two nerds!) and "Boys Boys Boys" music video. No self-referential winks to how this might achksually be ProBLeMAtIC. Just a confident IT GIRL grabbing the attention of a nation. The male role model too has been going through a rehabilitation lately. I absolutely loved F1: The Movie. It's classic Jerry Bruckheimer Top Gun pathos: Real men doing dangerous stuff for the chase, the glory, and the next generation with a love story that involves a competent yet feminine female partner. Again, it's an entirely sincere story with great morals. The generational gap is real, but we can learn from each other. Young hotshots have speed but lack experience. Old-timers can be grumpy but their wisdom was hard-won. Even the diversity in that movie fails to feel forced! It's a woman leading the engineering in a male-dominated world, and she can't quite hack it at first. Her designs are too by-the-book. But then Brad Pitt sells the gamble of a dirty-air fighter design, she steps up her game, and wins the crown by her wits and talent. Tell me that isn't a wholesome story! David Foster Wallace nailed this all in his critique of postmodernism. He called out the irony, cynicism, and irreverence that had fully permeated the culture from the 90s forward. And frankly, he was bored with it! It's an amazing interview to watch today. Wallace had the diagnosis nailed back in 1997. But it's taken us until these mid-2020s to fully embrace its conclusions. We need earnest values and virtues. We need sincere stories that are not afraid of grand narratives. That don't constantly have to deconstruct "but what is good really?", and dare embrace a solid defense of "some ways of being really are better." We also need to have fun! We need to throw away these shit-tinted glasses that see everything in the world as a problematic example of some injustice or oppression. We a bit of gratitude for technology and progress! That's what the Sweeney campaign is doing. That's what Brad Pitt is racing for in F1: The Movie. That's what I'm here for! Because as much as I love the croissants of the Old World, I find myself craving that uniquely American brand of optimism, enthusiasm, and determination more whenever I've been back in Europe for too long. Give me some Weird Science! Give me some Sabrina at the pool! Give me some American 80s vibes!

11 hours ago • 2 votes

Writing: Blog Posts and Songs

I was listening to a podcast interview with the Jackson Browne (American singer/songwriter, political activist, and inductee into the Rock and Roll Hall of Fame) and the interviewer asks him how he approaches writing songs with social commentaries and critiques — something along the lines of: “How do you get from the New York Times headline on a social subject to the emotional heart of a song that matters to each individual?” Browne discusses how if you’re too subtle, people won’t know what you’re talking about. And if you’re too direct, you run the risk of making people feel like they’re being scolded. Here’s what he says about his songwriting: I want this to sound like you and I were drinking in a bar and we’re just talking about what’s going on in the world. Not as if you’re at some elevated place and lecturing people about something they should know about but don’t but [you think] they should care. You have to get to people where [they are, where] they do care and where they do know. I think that’s a great insight for anyone looking to have a connecting, effective voice. I know for me, it’s really easily to slide into a lecturing voice — you “should” do this and you “shouldn’t” do that. But I like Browne’s framing of trying to have an informal, conversational tone that meets people where they are. Like you’re discussing an issue in the bar, rather than listening to a sermon. Chris Coyier is the canonical example of this that comes to mind. I still think of this post from CSS Tricks where Chris talks about how to have submit buttons that go to different URLs: When you submit that form, it’s going to go to the URL /submit. Say you need another submit button that submits to a different URL. It doesn’t matter why. There is always a reason for things. The web is a big place and all that. He doesn’t conjure up some universally-applicable, justified rationale for why he’s sharing this method. Nor is there any pontificating on why this is “good” or “bad”. Instead, like most of Chris’ stuff, I read it as a humble acknowledgement of the practicalities at hand — “Hey, the world is a big place. People have to do crafty things to make their stuff work. And if you’re in that situation, here’s something that might help what ails ya.” I want to work on developing that kind of a voice because I love reading voices like that. Email · Mastodon · Bluesky

2 days ago • 5 votes

Doing versus Delegating

A staff+ skill

2 days ago • 8 votes

p-fast trie, but smaller

Previously, I wrote some sketchy ideas for what I call a p-fast trie, which is basically a wide fan-out variant of an x-fast trie. It allows you to find the longest matching prefix or nearest predecessor or successor of a query string in a set of names in O(log k) time, where k is the key length. My initial sketch was more complicated and greedy for space than necessary, so here’s a simplified revision. (“p” now stands for prefix.) layout A p-fast trie stores a lexicographically ordered set of names. A name is a sequence of characters from some small-ish character set. For example, DNS names can be represented as a set of about 50 letters, digits, punctuation and escape characters, usually one per byte of name. Names that are arbitrary bit strings can be split into chunks of 6 bits to make a set of 64 characters. Every unique prefix of every name is added to a hash table. An entry in the hash table contains: A shared reference to the closest name lexicographically greater than or equal to the prefix. Multiple hash table entries will refer to the same name. A reference to a name might instead be a reference to a leaf object containing the name. The length of the prefix. To save space, each prefix is not stored separately, but implied by the combination of the closest name and prefix length. A bitmap with one bit per possible character, corresponding to the next character after this prefix. For every other prefix that matches this prefix and is one character longer than this prefix, a bit is set in the bitmap corresponding to the last character of the longer prefix. search The basic algorithm is a longest-prefix match. Look up the query string in the hash table. If there’s a match, great, done. Otherwise proceed by binary chop on the length of the query string. If the prefix isn’t in the hash table, reduce the prefix length and search again. (If the empty prefix isn’t in the hash table then there are no names to find.) If the prefix is in the hash table, check the next character of the query string in the bitmap. If its bit is set, increase the prefix length and search again. Otherwise, this prefix is the answer. predecessor Instead of putting leaf objects in a linked list, we can use a more complicated search algorithm to find names lexicographically closest to the query string. It’s tricky because a longest-prefix match can land in the wrong branch of the implicit trie. Here’s an outline of a predecessor search; successor requires more thought. During the binary chop, when we find a prefix in the hash table, compare the complete query string against the complete name that the hash table entry refers to (the closest name greater than or equal to the common prefix). If the name is greater than the query string we’re in the wrong branch of the trie, so reduce the length of the prefix and search again. Otherwise search the set bits in the bitmap for one corresponding to the greatest character less than the query string’s next character; if there is one remember it and the prefix length. This will be the top of the sub-trie containing the predecessor, unless we find a longer match. If the next character’s bit is set in the bitmap, continue searching with a longer prefix, else stop. When the binary chop has finished, we need to walk down the predecessor sub-trie to find its greatest leaf. This must be done one character at a time – there’s no shortcut. thoughts In my previous note I wondered how the number of search steps in a p-fast trie compares to a qp-trie. I have some old numbers measuring the average depth of binary, 4-bit, 5-bit, 6-bit and 4-bit, 5-bit, dns qp-trie variants. A DNS-trie varies between 7 and 15 deep on average, depending on the data set. The number of steps for a search matches the depth for exact-match lookups, and is up to twice the depth for predecessor searches. A p-fast trie is at most 9 hash table probes for DNS names, and unlikely to be more than 7. I didn’t record the average length of names in my benchmark data sets, but I guess they would be 8–32 characters, meaning 3–5 probes. Which is far fewer than a qp-trie, though I suspect a hash table probe takes more time than chasing a qp-trie pointer. (But this kind of guesstimate is notoriously likely to be wrong!) However, a predecessor search might need 30 probes to walk down the p-fast trie, which I think suggests a linked list of leaf objects is a better option.

2 days ago • 5 votes

Software books I wish I could read

New Logic for Programmers Release! v0.11 is now available! This is over 20% longer than v0.10, with a new chapter on code proofs, three chapter overhauls, and more! Full release notes here. Software books I wish I could read I'm writing Logic for Programmers because it's a book I wanted to have ten years ago. I had to learn everything in it the hard way, which is why I'm ensuring that everybody else can learn it the easy way. Books occupy a sort of weird niche in software. We're great at sharing information via blogs and git repos and entire websites. These have many benefits over books: they're free, they're easily accessible, they can be updated quickly, they can even be interactive. But no blog post has influenced me as profoundly as Data and Reality or Making Software. There is no blog or talk about debugging as good as the Debugging book. It might not be anything deeper than "people spend more time per word on writing books than blog posts". I dunno. So here are some other books I wish I could read. I don't think any of them exist yet but it's a big world out there. Also while they're probably best as books, a website or a series of blog posts would be ok too. Everything about Configurations The whole topic of how we configure software, whether by CLI flags, environmental vars, or JSON/YAML/XML/Dhall files. What causes the configuration complexity clock? How do we distinguish between basic, advanced, and developer-only configuration options? When should we disallow configuration? How do we test all possible configurations for correctness? Why do so many widespread outages trace back to misconfiguration, and how do we prevent them? I also want the same for plugin systems. Manifests, permissions, common APIs and architectures, etc. Configuration management is more universal, though, since everybody either uses software with configuration or has made software with configuration. The Big Book of Complicated Data Schemas I guess this would kind of be like Schema.org, except with a lot more on the "why" and not the what. Why is important for the Volcano model to have a "smokingAllowed" field?1 I'd see this less as "here's your guide to putting Volcanos in your database" and more "here's recurring motifs in modeling interesting domains", to help a person see sources of complexity in their own domain. Does something crop up if the references can form a cycle? If a relationship needs to be strictly temporary, or a reference can change type? Bonus: path dependence in data models, where an additional requirement leads to a vastly different ideal data model that a company couldn't do because they made the old model. (This has got to exist, right? Business modeling is a big enough domain that this must exist. Maybe The Essence of Software touches on this? Man I feel bad I haven't read that yet.) Computer Science for Software Engineers Yes, I checked, this book does not exist (though maybe this is the same thing). I don't have any formal software education; everything I know was either self-taught or learned on the job. But it's way easier to learn software engineering that way than computer science. And I bet there's a lot of other engineers in the same boat. This book wouldn't have to be comprehensive or instructive: just enough about each topic to understand why it's an area of study and appreciate how research in it eventually finds its way into practice. MISU Patterns MISU, or "Make Illegal States Unrepresentable", is the idea of designing system invariants in the structure of your data. For example, if a Contact needs at least one of email or phone to be non-null, make it a sum type over EmailContact, PhoneContact, EmailPhoneContact (from this post). MISU is great. Most MISU in the wild look very different than that, though, because the concept of MISU is so broad there's lots of different ways to achieve it. And that means there are "patterns": smart constructors, product types, properly using sets, newtypes to some degree, etc. Some of them are specific to typed FP, while others can be used in even untyped languages. Someone oughta make a pattern book. My one request would be to not give them cutesy names. Do something like the Aarne–Thompson–Uther Index, where items are given names like "Recognition by manner of throwing cakes of different weights into faces of old uncles". Names can come later. The Tools of '25 Not something I'd read, but something to recommend to junior engineers. Starting out it's easy to think the only bit that matters is the language or framework and not realize the enormous amount of surrounding tooling you'll have to learn. This book would cover the basics of tools that enough developers will probably use at some point: git, VSCode, very basic Unix and bash, curl. Maybe the general concepts of tools that appear in every ecosystem, like package managers, build tools, task runners. That might be easier if we specialize this to one particular domain, like webdev or data science. Ideally the book would only have to be updated every five years or so. No LLM stuff because I don't expect the tooling will be stable through 2026, to say nothing of 2030. A History of Obsolete Optimizations Probably better as a really long blog series. Each chapter would be broken up into two parts: A deep dive into a brilliant, elegant, insightful historical optimization designed to work within the constraints of that era's computing technology What we started doing instead, once we had more compute/network/storage available. c.f. A Spellchecker Used to Be a Major Feat of Software Engineering. Bonus topics would be brilliance obsoleted by standardization (like what people did before git and json were universal), optimizations we do today that may not stand the test of time, and optimizations from the past that did. Sphinx Internals I need this. I've spent so much goddamn time digging around in Sphinx and docutils source code I'm gonna throw up. Systems Distributed Talk Today! Online premier's at noon central / 5 PM UTC, here! I'll be hanging out to answer questions and be awkward. You ever watch a recording of your own talk? It's real uncomfortable! In this case because it's a field on one of Volcano's supertypes. I guess schemas gotta follow LSP too ↩

2 days ago • 10 votes

New here?

The Pull Request

Improve your reading experience

More from Tyler Cipriani: blog

More in programming

bored reading