Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
11
[The] Linux kernel uses GPLv2, and if you distribute GPLv2 code, you have to provide a copy of the source (and modifications) once someone asks for it. And now I’m asking nicely for you to do so 🙂 – Joga, bbs.onyx-international.com Boox in split screen, typewriter mode In January, I bought a Boox Go 10.3—a 10.3-inch, 300-ppi, e-ink Android tablet. After two months, I use the Boox daily—it’s replaced my planner, notebook, countless PDF print-offs, and the good parts of my phone. But Boox’s parent company, Onyx, is sketchy. I’m conflicted. The Boox Go is a beautiful, capable tablet that I use every day, but I recommend avoiding as long as Onyx continues to disregard the rights of its users. How I’m using my Boox My e-ink floor desk Each morning, I plop down in front of my MagicHold laptop stand and journal on my Boox with Obsidian. I use Syncthing to back up my planner and sync my Zotero library between my Boox and laptop. In the evening, I review my PDF planner and plot for...
a month ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Tyler Cipriani: blog

Eventually consistent plain text accounting

.title { text-wrap: balance } Spending for October, generated by piping hledger → R Over the past six months, I’ve tracked my money with hledger—a plain text double-entry accounting system written in Haskell. It’s been surprisingly painless. My previous attempts to pick up real accounting tools floundered. Hosted tools are privacy nightmares, and my stint with GnuCash didn’t last. But after stumbling on Dmitry Astapov’s “Full-fledged hledger” wiki1, it clicked—eventually consistent accounting. Instead of modeling your money all at once, take it one hacking session at a time. It should be easy to work towards eventual consistency. […] I should be able to [add financial records] bit by little bit, leaving things half-done, and picking them up later with little (mental) effort. – Dmitry Astapov, Full-Fledged Hledger Principles of my system I’ve cobbled together a system based on these principles: Avoid manual entry – Avoid typing in each transaction. Instead, rely on CSVs from the bank. CSVs as truth – CSVs are the only things that matter. Everything else can be blown away and rebuilt anytime. Embrace version control – Keep everything under version control in Git for easy comparison and safe experimentation. Learn hledger in five minutes hledger concepts are heady, but its use is simple. I divide the core concepts into two categories: Stuff hledger cares about: Transactions – how hledger moves money between accounts. Journal files – files full of transactions Stuff I care about: Rules files – how I set up accounts, import CSVs, and move money between accounts. Reports – help me see where my money is going and if I messed up my rules. Transactions move money between accounts: 2024-01-01 Payday income:work $-100.00 assets:checking $100.00 This transaction shows that on Jan 1, 2024, money moved from income:work into assets:checking—Payday. The sum of each transaction should be $0. Money comes from somewhere, and the same amount goes somewhere else—double-entry accounting. This is powerful technology—it makes mistakes impossible to ignore. Journal files are text files containing one or more transactions: 2024-01-01 Payday income:work $-100.00 assets:checking $100.00 2024-01-02 QUANSHENG UVK5 assets:checking $-29.34 expenses:fun:radio $29.34 Rules files transform CSVs into journal files via regex matching. Here’s a CSV from my bank: Transaction Date,Description,Category,Type,Amount,Memo 09/01/2024,DEPOSIT Paycheck,Payment,Payment,1000.00, 09/04/2024,PizzaPals Pizza,Food & Drink,Sale,-42.31, 09/03/2024,Amazon.com*XXXXXXXXY,Shopping,Sale,-35.56, 09/03/2024,OBSIDIAN.MD,Shopping,Sale,-10.00, 09/02/2024,Amazon web services,Personal,Sale,-17.89, And here’s a checking.rules to transform that CSV into a journal file so I can use it with hledger: # checking.rules # -------------- # Map CSV fields → hledger fields[0] fields date,description,category,type,amount,memo,_ # `account1`: the account for the whole CSV.[1] account1 assets:checking account2 expenses:unknown skip 1 date-format %m/%d/%Y currency $ if %type Payment account2 income:unknown if %category Food & Drink account2 expenses:food:dining # [0]: <https://hledger.org/hledger.html#field-names> # [1]: <https://hledger.org/hledger.html#account-field> With these two files (checking.rules and 2024-09_checking.csv), I can make the CSV into a journal: $ > 2024-09_checking.journal \ hledger print \ --rules-file checking.rules \ -f 2024-09_checking.csv $ head 2024-09_checking.journal 2024-09-01 DEPOSIT Paycheck assets:checking $1000.00 income:unknown $-1000.00 2024-09-02 Amazon web services assets:checking $-17.89 expenses:unknown $17.89 Reports are interesting ways to view transactions between accounts. There are registers, balance sheets, and income statements: $ hledger incomestatement \ --depth=2 \ --file=2024-09_bank.journal Revenues: $1000.00 income:unknown ----------------------- $1000.00 Expenses: $42.31 expenses:food $63.45 expenses:unknown ----------------------- $105.76 ----------------------- Net: $894.24 At the beginning of September, I spent $105.76 and made $1000, leaving me with $894.24. But a good chunk is going to the default expense account, expenses:unknown. I can use the hleger aregister to see what those transactions are: $ hledger areg expenses:unknown \ --file=2024-09_checking.journal \ -O csv | \ csvcut -c description,change | \ csvlook | description | change | | ------------------------ | ------ | | OBSIDIAN.MD | 10.00 | | Amazon web services | 17.89 | | Amazon.com*XXXXXXXXY | 35.56 | l Then, I can add some more rules to my checking.rules: if OBSIDIAN.MD account2 expenses:personal:subscriptions if Amazon web services account2 expenses:personal:web:hosting if Amazon.com account2 expenses:personal:shopping:amazon Now, I can reprocess my data to get a better picture of my spending: $ > 2024-09_bank.journal \ hledger print \ --rules-file bank.rules \ -f 2024-09_bank.csv $ hledger bal expenses \ --depth=3 \ --percent \ -f 2024-09_checking2.journal 30.0 % expenses:food:dining 33.6 % expenses:personal:shopping 9.5 % expenses:personal:subscriptions 16.9 % expenses:personal:web -------------------- 100.0 % For the Amazon.com purchase, I lumped it into the expenses:personal:shopping account. But I could dig deeper—download my order history from Amazon and categorize that spending. This is the power of working bit-by-bit—the data guides you to the next, deeper rabbit hole. Goals and non-goals Why am I doing this? For years, I maintained a monthly spreadsheet of account balances. I had a balance sheet. But I still had questions. Spending over six months, generated by piping hledger → gnuplot Before diving into accounting software, these were my goals: Granular understanding of my spending – The big one. This is where my monthly spreadsheet fell short. I knew I had money in the bank—I kept my monthly balance sheet. I budgeted up-front the % of my income I was saving. But I had no idea where my other money was going. Data privacy – I’m unwilling to hand the keys to my accounts to YNAB or Mint. Increased value over time – The more time I put in, the more value I want to get out—this is what you get from professional tools built for nerds. While I wished for low-effort setup, I wanted the tool to be able to grow to more uses over time. Non-goals—these are the parts I never cared about: Investment tracking – For now, I left this out of scope. Between monthly balances in my spreadsheet and online investing tools’ ability to drill down, I was fine.2 Taxes – Folks smarter than me help me understand my yearly taxes.3 Shared system – I may want to share reports from this system, but no one will have to work in it except me. Cash – Cash transactions are unimportant to me. I withdraw money from the ATM sometimes. It evaporates. hledger can track all these things. My setup is flexible enough to support them someday. But that’s unimportant to me right now. Monthly maintenance I spend about an hour a month checking in on my money Which frees me to spend time making fancy charts—an activity I perversely enjoy. Income vs. Expense, generated by piping hledger → gnuplot Here’s my setup: $ tree ~/Documents/ledger . ├── export │   ├── 2024-balance-sheet.txt │   └── 2024-income-statement.txt ├── import │   ├── in │   │   ├── amazon │   │   │   └── order-history.csv │   │   ├── credit │   │   │   ├── 2024-01-01_2024-02-01.csv │   │   │   ├── ... │   │   │   └── 2024-10-01_2024-11-01.csv │   │   └── debit │   │   ├── 2024-01-01_2024-02-01.csv │   │   ├── ... │   │   └── 2024-10-01_2024-11-01.csv │   └── journal │   ├── amazon │   │   └── order-history.journal │   ├── credit │   │   ├── 2024-01-01_2024-02-01.journal │   │   ├── ... │   │   └── 2024-10-01_2024-11-01.journal │   └── debit │   ├── 2024-01-01_2024-02-01.journal │   ├── ... │   └── 2024-10-01_2024-11-01.journal ├── rules │   ├── amazon │   │   └── journal.rules │   ├── credit │   │   └── journal.rules │   ├── debit │   │   └── journal.rules │   └── common.rules ├── 2024.journal ├── Makefile └── README Process: Import – download a CSV for the month from each account and plop it into import/in/<account>/<dates>.csv Make – run make Squint – Look at git diff; if it looks good, git add . && git commit -m "💸" otherwise review hledger areg to see details. The Makefile generates everything under import/journal: journal files from my CSVs using their corresponding rules. reports in the export folder I include all the journal files in the 2024.journal with the line: include ./import/journal/*/*.journal Here’s the Makefile: SHELL := /bin/bash RAW_CSV = $(wildcard import/in/**/*.csv) JOURNALS = $(foreach file,$(RAW_CSV),$(subst /in/,/journal/,$(patsubst %.csv,%.journal,$(file)))) .PHONY: all all: $(JOURNALS) hledger is -f 2024.journal > export/2024-income-statement.txt hledger bs -f 2024.journal > export/2024-balance-sheet.txt .PHONY clean clean: rm -rf import/journal/**/*.journal import/journal/%.journal: import/in/%.csv @echo "Processing csv $< to $@" @echo "---" @mkdir -p $(shell dirname $@) @hledger print --rules-file rules/$(shell basename $$(dirname $<))/journal.rules -f "$<" > "$@" If I find anything amiss (e.g., if my balances are different than what the bank tells me), I look at hleger areg. I may tweak my rules or my CSVs and then I run make clean && make and try again. Simple, plain text accounting made simple. And if I ever want to dig deeper, hledger’s docs have more to teach. But for now, the balance of effort vs. reward is perfect. while reading a blog post from Jonathan Dowland↩︎ Note, this is covered by full-fledged hledger – Investements↩︎ Also covered in full-fledged hledger – Tax returns↩︎

5 months ago 40 votes
Subliminal git commits

Luckily, I speak Leet. – Amita Ramanujan, Numb3rs, CBS’s IRC Drama There’s an episode of the CBS prime-time drama Numb3rs that plumbs the depths of Dr. Joel Fleischman’s1 knowledge of IRC. In one scene, Fleischman wonders, “What’s ‘leet’”? “Leet” is writing that replaces letters with numbers, e.g., “Numb3rs,” where 3 stands in for e. In short, leet is like the heavy-metal “S” you drew in middle school: Sweeeeet. / \ / | \ | | | \ \ | | | \ | / \ / ASCII art version of your misspent youth. Following years of keen observation, I’ve noticed Git commit hashes are also letters and numbers. Git commit hashes are, as Fleischman might say, prime targets for l33tification. What can I spell with a git commit? DenITDao via orlybooks) With hexidecimal we can spell any word containing the set of letters {A, B, C, D, E, F}—DEADBEEF (a classic) or ABBABABE (for Mama Mia aficionados). This is because hexidecimal is a base-16 numbering system—a single “digit” represents 16 numbers: Base-10: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 16 15 Base-16: 0 1 2 3 4 5 6 7 8 9 A B C D E F Leet expands our palette of words—using 0, 1, and 5 to represent O, I, and S, respectively. I created a script that scours a few word lists for valid words and phrases. With it, I found masterpieces like DADB0D (dad bod), BADA55 (bad ass), and 5ADBAB1E5 (sad babies). Manipulating commit hashes for fun and no profit Git commit hashes are no mystery. A commit hash is the SHA-1 of a commit object. And a commit object is the commit message with some metadata. $ mkdir /tmp/BADA55-git && cd /tmp/BAD55-git $ git init Initialized empty Git repository in /tmp/BADA55-git/.git/ $ echo '# BADA55 git repo' > README.md && git add README.md && git commit -m 'Initial commit' [main (root-commit) 68ec0dd] Initial commit 1 file changed, 1 insertion(+) create mode 100644 README.md $ git log --oneline 68ec0dd (HEAD -> main) Initial commit Let’s confirm we can recreate the commit hash: $ git cat-file -p 68ec0dd > commit-msg $ sha1sum <(cat \ <(printf "commit ") \ <(wc -c < commit-msg | tr -d '\n') \ <(printf '%b' '\0') commit-msg) 68ec0dd6dead532f18082b72beeb73bd828ee8fc /dev/fd/63 Our repo’s first commit has the hash 68ec0dd. My goal is: Make 68ec0dd be BADA55. Keep the commit message the same, visibly at least. But I’ll need to change the commit to change the hash. To keep those changes invisible in the output of git log, I’ll add a \t and see what happens to the hash. $ truncate -s -1 commit-msg # remove final newline $ printf '\t\n' >> commit-msg # Add a tab $ # Check the new SHA to see if it's BADA55 $ sha1sum <(cat \ <(printf "commit ") \ <(wc -c < commit-msg | tr -d '\n') \ <(printf '%b' '\0') commit-msg) 27b22ba5e1c837a34329891c15408208a944aa24 /dev/fd/63 Success! I changed the SHA-1. Now to do this until we get to BADA55. Fortunately, user not-an-aardvark created a tool for that—lucky-commit that manipulates a commit message, adding a combination of \t and [:space:] characters until you hit a desired SHA-1. Written in rust, lucky-commit computes all 256 unique 8-bit strings composed of only tabs and spaces. And then pads out commits up to 48-bits with those strings, using worker threads to quickly compute the SHA-12 of each commit. It’s pretty fast: $ time lucky_commit BADA555 real 0m0.091s user 0m0.653s sys 0m0.007s $ git log --oneline bada555 (HEAD -> main) Initial commit $ xxd -c1 <(git cat-file -p 68ec0dd) | grep -cPo ': (20|09)' 12 $ xxd -c1 <(git cat-file -p HEAD) | grep -cPo ': (20|09)' 111 Now we have an more than an initial commit. We have a BADA555 initial commit. All that’s left to do is to make ALL our commits BADA55 by abusing git hooks. $ cat > .git/hooks/post-commit && chmod +x .git/hooks/post-commit #!/usr/bin/env bash echo 'L337-ifying!' lucky_commit BADA55 $ echo 'A repo that is very l33t.' >> README.md && git commit -a -m 'l33t' L337-ifying! [main 0e00cb2] l33t 1 file changed, 1 insertion(+) $ git log --oneline bada552 (HEAD -> main) l33t bada555 Initial commit And now I have a git repo almost as cool as the sweet “S” I drew in middle school. This is a Northern Exposure spin off, right? I’ve only seen 1:48 of the show…↩︎ or SHA-256 for repos that have made the jump to a more secure hash function↩︎

6 months ago 56 votes
The Pull Request

A brief and biased history. Oh yeah, there’s pull requests now – GitHub blog, Sat, 23 Feb 2008 When GitHub launched, it had no code review. Three years after launch, in 2011, GitHub user rtomayko became the first person to make a real code comment, which read, in full: “+1”. Before that, GitHub lacked any way to comment on code directly. Instead, pull requests were a combination of two simple features: Cross repository compare view – a feature they’d debuted in 2010—git diff in a web page. A comments section – a feature most blogs had in the 90s. There was no way to thread comments, and the comments were on a different page than the diff. GitHub pull requests circa 2010. This is from the official documentation on GitHub. Earlier still, when the pull request debuted, GitHub claimed only that pull requests were “a way to poke someone about code”—a way to direct message maintainers, but one that lacked any web view of the code whatsoever. For developers, it worked like this: Make a fork. Click “pull request”. Write a message in a text form. Send the message to someone1 with a link to your fork. Wait for them to reply. In effect, pull requests were a limited way to send emails to other GitHub users. Ten years after this humble beginning—seven years after the first code comment—when Microsoft acquired GitHub for $7.5 Billion, this cobbled-together system known as “GitHub flow” had become the default way to collaborate on code via Git. And I hate it. Pull requests were never designed. They emerged. But not from careful consideration of the needs of developers or maintainers. Pull requests work like they do because they were easy to build. In 2008, GitHub’s developers could have opted to use git format-patch instead of teaching the world to juggle branches. Or they might have chosen to generate pull requests using the git request-pull command that’s existed in Git since 2005 and is still used by the Linux kernel maintainers today2. Instead, they shrugged into GitHub flow, and that flow taught the world to use Git. And commit histories have sucked ever since. For some reason, github has attracted people who have zero taste, don’t care about commit logs, and can’t be bothered. – Linus Torvalds, 2012 “Someone” was a person chosen by you from a checklist of the people who had also forked this repository at some point.↩︎ Though to make small, contained changes you’d use git format-patch and git am.↩︎

7 months ago 72 votes
Git the stupid password store

.title {text-wrap:balance;} GIT - the stupid content tracker “git” can mean anything, depending on your mood. – Linus Torvalds, Initial revision of “git”, the information manager from hell Like most git features, gitcredentials(7) are obscure, byzantine, and incredibly useful. And, for me, they’re a nice, hacky solution to a simple problem. Problem: Home directories teeming with tokens. Too many programs store cleartext credentials in config files in my home directory, making exfiltration all too easy. Solution: For programs I write, I can use git credential fill – the password library I never knew I installed. #!/usr/bin/env bash input="\ protocol=https host=example.com user=thcipriani " eval "$(echo "$input" | git credential fill)" echo "The password is: $password" Which looks like this when you run it: $ ./prompt.sh Password for 'https://thcipriani@example.com': The password is: hunter2 What did git credentials fill do? Accepted a protocol, username, and host on standard input. Called out to my git credential helper My credential helper checked for credentials matching https://thcipriani@example.com and found nothing Since my credential helper came up empty, it prompted me for my password Finally, it echoed <key>=<value>\n pairs for the keys protocol, host, username, and password to standard output. If I want, I can tell my credential helper to store the information I entered: git credential approve <<EOF protocol=$protocol username=$username host=$host password=$password EOF If I do that, the next time I run the script, it finds the password without prompting: $ ./prompt.sh The password is: hunter2 What are git credentials? Surprisingly, the intended purpose of git credentials is NOT “a weird way to prompt for passwords.” The problem git credentials solve is this: With git over ssh, you use your keys. With git over https, you type a password. Over and over and over. Beleaguered git maintainers solved this dilemma with the credential storage system—git credentials. With the right configuration, git will stop asking for your password when you push to an https remote. Instead, git credentials retrieve and send auth info to remotes. On the labyrinthine options of git credentials My mind initially refused to learn git credentials due to its twisty maze of terms that all sound alike: git credential fill: how you invoke a user’s configured git credential helper git credential approve: how you save git credentials (if this is supported by the user’s git credential helper) git credential.helper: the git config that points to a script that poops out usernames and passwords. These helper scripts are often named git-credential-<something>. git-credential-cache: a specific, built-in git credential helper that caches credentials in memory for a while. git-credential-store: STOP. DON’T TOUCH. This is a specific, built-in git credential helper that stores credentials in cleartext in your home directory. Whomp whomp. git-credential-manager: a specific and confusingly named git credential helper from Microsoft®. If you’re on Linux or Mac, feel free to ignore it. But once I mapped the terms, I only needed to pick a git credential helper. Configuring good credential helpers The built-in git-credential-store is a bad credential helper—it saves your passwords in cleartext in ~/.git-credentials.1 If you’re on a Mac, you’re in luck2—one command points git credentials to your keychain: git config --global credential.helper osxkeychain Third-party developers have contributed helpers for popular password stores: 1Password pass: the standard Unix password manager OAuth Git’s documentation contains a list of credential-helpers, too Meanwhile, Linux and Windows have standard options. Git’s source repo includes helpers for these options in the contrib directory. On Linux, you can use libsecret. Here’s how I configured it on Debian: sudo apt install libsecret-1-0 libsecret-1-dev cd /usr/share/doc/git/contrib/credential/libsecret/ sudo make sudo mv git-credential-libsecret /usr/local/bin/ git config --global credential.helper libsecret On Windows, you can use the confusingly named git credential manager. I have no idea how to do this, and I refuse to learn. Now, if you clone a repo over https, you can push over https without pain3. Plus, you have a handy trick for shell scripts. git-credential-store is not a git credential helper of honor. No highly-esteemed passwords should be stored with it. This message is a warning about danger. The danger is still present, in your time, as it was in ours.↩︎ I think. I only have Linux computers to test this on, sorry ;_;↩︎ Or the config option pushInsteadOf, which is what I actually do.↩︎

8 months ago 56 votes

More in programming

Why did Stripe build Sorbet? (~2017).

Many hypergrowth companies of the 2010s battled increasing complexity in their codebase by decomposing their monoliths. Stripe was somewhat of an exception, largely delaying decomposition until it had grown beyond three thousand engineers and had accumulated a decade of development in its core Ruby monolith. Even now, significant portions of their product are maintained in the monolithic repository, and it’s safe to say this was only possible because of Sorbet’s impact. Sorbet is a custom static type checker for Ruby that was initially designed and implemented by Stripe engineers on their Product Infrastructure team. Stripe’s Product Infrastructure had similar goals to other companies’ Developer Experience or Developer Productivity teams, but it focused on improving productivity through changes in the internal architecture of the codebase itself, rather than relying solely on external tooling or processes. This strategy explains why Stripe chose to delay decomposition for so long, and how the Product Infrastructure team invested in developer productivity to deal with the challenges of a large Ruby codebase managed by a large software engineering team with low average tenure caused by rapid hiring. Before wrapping this introduction, I want to explicitly acknowledge that this strategy was spearheaded by Stripe’s Product Infrastructure team, not by me. Although I ultimately became responsible for that team, I can’t take credit for this strategy’s thinking. Rather, I was initially skeptical, preferring an incremental migration to an existing strongly-typed programming language, either Java for library coverage or Golang for Stripe’s existing familiarity. Despite my initial doubts, the Sorbet project eventually won me over with its indisputable results. This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts. Reading this document To apply this strategy, start at the top with Policy. To understand the thinking behind this strategy, read sections in reverse order, starting with Explore. More detail on this structure in Making a readable Engineering Strategy document. Policy & Operation The Product Infrastructure team is investing in Stripe’s developer experience by: Every six months, Product Infrastructure will select its three highest priority areas to focus, and invest a significant majority of its energy into those. We will provide minimal support for other areas. We commit to refreshing our priorities every half after running the developer productivity survey. We will further share our results, and priorities, in each Quarterly Business Review. Our three highest priority areas for this half are: Add static typing to the highest value portions of our Ruby codebase, such that we can run the type checker locally and on the test machines to identify errors more quickly. Support selective test execution such that engineers can quickly determine and run the most appropriate tests on their machine rather than delaying until tests run on the build server. Instrument test failures such that we have better data to prioritize future efforts. Static typing is not a typical solution to developer productivity, so it requires some explanation when we say this is our highest priority area for investment. Doubly so when we acknowledge that it will take us 12-24 months of much of the team’s time to get our type checker to an effective place. Our type checker, which we plan to name Sorbet, will allow us to continue developing within our existing Ruby codebase. It will further allow our product engineers to remain focused on developing new functionality rather than migrating existing functionality to new services or programming languages. Instead, our Product Infrastructure team will centrally absorb both the development of the type checker and the initial rollout to our codebase. It’s possible for Product Infrastructure to take on both, despite its fixed size. We’ll rely on a hybrid approach of deep-dives to add typing to particularly complex areas, and scripts to rewrite our code’s Abstract Syntax Trees (AST) for less complex portions. In the relatively unlikely event that this approach fails, the cost to Stripe is of a small, known size: approximately six months of half the Product Infrastructure team, which is what we anticipate requiring to determine if this approach is viable. Based on our knowledge of Facebook’s Hack project, we believe we can build a static type checker that runs locally and significantly faster than our test suite. It’s hard to make a precise guess now, but we think less than 30 seconds to type our entire codebase, despite it being quite large. This will allow for a highly productive local development experience, even if we are not able to speed up local testing. Even if we do speed up local testing, typing would help us eliminate one of the categories of errors that testing has been unable to eliminate, which is passing of unexpected types across code paths which have been tested for expected scenarios but not for entirely unexpected scenarios. Once the type checker has been validated, we can incrementally prioritize adding typing to the highest value places across the codebase. We do not need to wholly type our codebase before we can start getting meaningful value. In support of these static typing efforts, we will advocate for product engineers at Stripe to begin development using the Command Query Responsibility Segregation (CQRS) design pattern, which we believe will provide high-leverage interfaces for incrementally introducing static typing into our codebase. Selective test execution will allow developers to quickly run appropriate tests locally. This will allow engineers to stay in a tight local development loop, speeding up development of high quality code. Given that our codebase is not currently statically typed, inferring which tests to run is rather challenging. With our very high test coverage, and the fact that all tests will still be run before deployment to the production environment, we believe that we can rely on statistically inferring which tests are likely to fail when a given file is modified. Instrumenting test failures is our third, and lowest priority, project for this half. Our focus this half is purely on annotating errors for which we have high conviction about their source, whether infrastructure or test issues. For escalations and issues, reach out in the #product-infra channel. Diagnose In 2017, Stripe is a company of about 1,000 people, including 400 software engineers. We aim to grow our organization by about 70% year-over-year to meet increasing demand for a broader product portfolio and to scale our existing products and infrastructure to accommodate user growth. As our production stability has improved over the past several years, we have now turned our focus towards improving developer productivity. Our current diagnosis of our developer productivity is: We primarily fund developer productivity for our Ruby-authoring software engineers via our Product Infrastructure team. The Ruby-focused portion of that team has about ten engineers on it today, and is unlikely to significantly grow in the future. (If we do expand, we are likely to staff non-Ruby ecosystems like Scala or Golang.) We have two primary mechanisms for understanding our engineer’s developer experience. The first is standard productivity metrics around deploy time, deploy stability, test coverage, test time, test flakiness, and so on. The second is a twice annual developer productivity survey. Looking at our productivity metrics, our test coverage remains extremely high, with coverage above 99% of lines, and tests are quite slow to run locally. They run quickly in our infrastructure because they are multiplexed across a large fleet of test runners. Tests have become slow enough to run locally that an increasing number of developers run an overly narrow subset of tests, or entirely skip running tests until after pushing their changes. They instead rely on our test servers to run against their pull request’s branch, which works well enough, but significantly slows down developer iteration time because the merge, build, and test cycle takes twenty to thirty minutes to complete. By the time their build-test cycle completes, they’ve lost their focus and maybe take several hours to return to addressing the results. There is significant disagreement about whether tests are becoming flakier due to test infrastructure issues, or due to quality issues of the tests themselves. At this point, there is no trustworthy dataset that allows us to attribute between those two causes. Feedback from the twice annual developer productivity survey supports the above diagnosis, and adds some additional nuance. Most concerning, although long-tenured Stripe engineers find themselves highly productive in our codebase, we increasingly hear in the survey that newly hired engineers with long tenures at other companies find themselves unproductive in our codebase. Specifically, they find it very difficult to determine how to safely make changes in our codebase. Our product codebase is entirely implemented in a single Ruby monolith. There is one narrow exception, a Golang service handling payment tokenization, which we consider out of scope for two reasons. First, it is kept intentionally narrow in order to absorb our SOC1 compliance obligations. Second, developers in that environment have not raised concerns about their productivity. Our data infrastructure is implemented in Scala. While these developers have concerns–primarily slow build times–they manage their build and deployment infrastructure independently, and the group remains relatively small. Ruby is not a highly performant programming language, but we’ve found it sufficiently efficient for our needs. Similarly, other languages are more cost-efficient from a compute resources perspective, but a significant majority of our spend is on real-time storage and batch computation. For these reasons alone, we would not consider replacing Ruby as our core programming language. Our Product Infrastructure team is about ten engineers, supporting about 250 product engineers. We anticipate this group growing modestly over time, but certainly sublinearly to the overall growth of product engineers. Developers working in Golang and Scala routinely ask for more centralized support, but it’s challenging to prioritize those requests as we’re forced to consider the return on improving the experience for 240 product engineers working in Ruby vs 10 in Golang or 40 data engineers in Scala. If we introduced more programming languages, this prioritization problem would become increasingly difficult, and we are already failing to support additional languages.

yesterday 5 votes
The new Framework 13 HX370

The new AMD HX370 option in the Framework 13 is a good step forward in performance for developers. It runs our HEY test suite in 2m7s, compared to 2m43s for the 7840U (and 2m49s for a M4 Pro!). It's also about 20% faster in most single-core tasks than the 7840U. But is that enough to warrant the jump in price? AMD's latest, best chips have suddenly gotten pretty expensive. The F13 w/ HX370 now costs $1,992 with 32GB RAM / 1TB. Almost the same an M4 Pro MBP14 w/ 24GB / 1TB ($2,199). I'd pick the Framework any day for its better keyboard, 3:2 matte screen, repairability, and superb Linux compatibility, but it won't be because the top option is "cheaper" any more.  Of course you could also just go with the budget 6-core Ryzen AI 5 340 in same spec for $1,362. I'm sure that's a great machine too. But maybe the sweet spot is actually the Ryzen AI 7 350. It "only" has 8 cores (vs 12 on the 370), but four of those are performance cores -- the same as the 370. And it's $300 cheaper. So ~$1,600 gets you out the door. I haven't actually tried the 350, though, so that's just speculation. I've been running the 370 for the last few months. Whichever chip you choose, the rest of the Framework 13 package is as good as it ever was. This remains my favorite laptop of at least the last decade. I've been running one for over a year now, and combined with Omakub + Neovim, it's the first machine in forever where I've actually enjoyed programming on a 13" screen. The 3:2 aspect ratio combined with Linux's superb multiple desktops that switch with 0ms lag and no animations means I barely miss the trusted 6K Apple XDR screen when working away from the desk. The HX370 gives me about 6 hours of battery life in mixed use. About the same as the old 7840U. Though if all I'm doing is writing, I can squeeze that to 8-10 hours. That's good enough for me, but not as good as a Qualcomm machine or an Apple M-chip machine. For some people, those extra hours really make the difference. What does make a difference, of course, is Linux. I've written repeatedly about how much of a joy it's been to rediscover Linux on the desktop, and it's a joy that keeps on giving. For web work, it's so good. And for any work that requires even a minimum of Docker, it's so fast (as the HEY suite run time attests). Apple still has a strong hardware game, but their software story is falling apart. I haven't heard many people sing the praises of new iOS or macOS releases in a long while. It seems like without an asshole in charge, both have move towards more bloat, more ads, more gimmicks, more control. Linux is an incredible antidote to this nonsense these days. It's also just fun! Seeing AMD catch up in outright performance if not efficiency has been a delight. Watching Framework perfect their 13" laptop while remaining 100% backwards compatible in terms of upgrades with the first versions is heartwarming. And getting to test the new Framework Desktop in advance of its Q3 release has only affirmed my commitment to both. But on the new HX370, it's in my opinion the best Linux laptop you can buy today, which by extension makes it the best web developer laptop too. The top spec might have gotten a bit pricey, but there are options all along the budget spectrum, which retains all the key ingredients any way. Hard to go wrong. Forza Framework!

yesterday 2 votes
Beyond `None`: actionable error messages for `keyring.get_password()`

I’m a big fan of keyring, a Python module made by Jason R. Coombs for storing secrets in the system keyring. It works on multiple operating systems, and it knows what password store to use for each of them. For example, if you’re using macOS it puts secrets in the Keychain, but if you’re on Windows it uses Credential Locker. The keyring module is a safe and portable way to store passwords, more secure than using a plaintext config file or an environment variable. The same code will work on different platforms, because keyring handles the hard work of choosing which password store to use. It has a straightforward API: the keyring.set_password and keyring.get_password functions will handle a lot of use cases. >>> import keyring >>> keyring.set_password("xkcd", "alexwlchan", "correct-horse-battery-staple") >>> keyring.get_password("xkcd", "alexwlchan") "correct-horse-battery-staple" Although this API is simple, it’s not perfect – I have some frustrations with the get_password function. In a lot of my projects, I’m now using a small function that wraps get_password. What do I find frustrating about keyring.get_password? If you look up a password that isn’t in the system keyring, get_password returns None rather than throwing an exception: >>> print(keyring.get_password("xkcd", "the_invisible_man")) None I can see why this makes sense for the library overall – a non-existent password is very normal, and not exceptional behaviour – but in my projects, None is rarely a usable value. I normally use keyring to retrieve secrets that I need to access protected resources – for example, an API key to call an API that requires authentication. If I can’t get the right secrets, I know I can’t continue. Indeed, continuing often leads to more confusing errors when some other function unexpectedly gets None, rather than a string. For a while, I wrapped get_password in a function that would throw an exception if it couldn’t find the password: def get_required_password(service_name: str, username: str) -> str: """ Get password from the specified service. If a matching password is not found in the system keyring, this function will throw an exception. """ password = keyring.get_password(service_name, username) if password is None: raise RuntimeError(f"Could not retrieve password {(service_name, username)}") return password When I use this function, my code will fail as soon as it fails to retrieve a password, rather than when it tries to use None as the password. This worked well enough for my personal projects, but it wasn’t a great fit for shared projects. I could make sense of the error, but not everyone could do the same. What’s that password meant to be? A good error message explains what’s gone wrong, and gives the reader clear steps for fixing the issue. The error message above is only doing half the job. It tells you what’s gone wrong (it couldn’t get the password) but it doesn’t tell you how to fix it. As I started using this snippet in codebases that I work on with other developers, I got questions when other people hit this error. They could guess that they needed to set a password, but the error message doesn’t explain how, or what password they should be setting. For example, is this a secret they should pick themselves? Is it a password in our shared password vault? Or do they need an API key for a third-party service? If so, where do they find it? I still think my initial error was an improvement over letting None be used in the rest of the codebase, but I realised I could go further. This is my extended wrapper: def get_required_password(service_name: str, username: str, explanation: str) -> str: """ Get password from the specified service. If a matching password is not found in the system keyring, this function will throw an exception and explain to the user how to set the required password. """ password = keyring.get_password(service_name, username) if password is None: raise RuntimeError( "Unable to retrieve required password from the system keyring!\n" "\n" "You need to:\n" "\n" f"1/ Get the password. Here's how: {explanation}\n" "\n" "2/ Save the new password in the system keyring:\n" "\n" f" keyring set {service_name} {username}\n" ) return password The explanation argument allows me to explain what the password is for to a future reader, and what value it should have. That information can often be found in a code comment or in documentation, but putting it in an error message makes it more visible. Here’s one example: get_required_password( "flask_app", "secret_key", explanation=( "Pick a random value, e.g. with\n" "\n" " python3 -c 'import secrets; print(secrets.token_hex())'\n" "\n" "This password is used to securely sign the Flask session cookie. " "See https://flask.palletsprojects.com/en/stable/config/#SECRET_KEY" ), ) If you call this function and there’s no keyring entry for flask_app/secret_key, you get the following error: Unable to retrieve required password from the system keyring! You need to: 1/ Get the password. Here's how: Pick a random value, e.g. with python3 -c 'import secrets; print(secrets.token_hex())' This password is used to securely sign the Flask session cookie. See https://flask.palletsprojects.com/en/stable/config/#SECRET_KEY 2/ Save the new password in the system keyring: keyring set flask_app secret_key It’s longer, but this error message is far more informative. It tells you what’s wrong, how to save a password, and what the password should be. This is based on a real example where the previous error message led to a misunderstanding. A co-worker saw a missing password called “secret key” and thought it referred to a secret key for calling an API, and didn’t realise it was actually for signing Flask session cookies. Now I can write a more informative error message, I can prevent that misunderstanding happening again. (We also renamed the secret, for additional clarity.) It takes time to write this explanation, which will only ever be seen by a handful of people, but I think it’s important. If somebody sees it at all, it’ll be when they’re setting up the project for the first time. I want that setup process to be smooth and straightforward. I don’t use this wrapper in all my code, particularly small or throwaway toys that won’t last long enough for this to be an issue. But in larger codebases that will be used by other developers, and which I expect to last a long time, I use it extensively. Writing a good explanation now can avoid frustration later. [If the formatting of this post looks odd in your feed reader, visit the original article]

yesterday 2 votes
Kagi Assistant is now available to all users!

At Kagi, our mission is simple: to humanise the web.

yesterday 1 votes
The Halting Problem is a terrible example of NP-Harder

Short one this time because I have a lot going on this week. In computation complexity, NP is the class of all decision problems (yes/no) where a potential proof (or "witness") for "yes" can be verified in polynomial time. For example, "does this set of numbers have a subset that sums to zero" is in NP. If the answer is "yes", you can prove it by presenting a set of numbers. We would then verify the witness by 1) checking that all the numbers are present in the set (~linear time) and 2) adding up all the numbers (also linear). NP-complete is the class of "hardest possible" NP problems. Subset sum is NP-complete. NP-hard is the set all problems at least as hard as NP-complete. Notably, NP-hard is not a subset of NP, as it contains problems that are harder than NP-complete. A natural question to ask is "like what?" And the canonical example of "NP-harder" is the halting problem (HALT): does program P halt on input C? As the argument goes, it's undecidable, so obviously not in NP. I think this is a bad example for two reasons: All NP requires is that witnesses for "yes" can be verified in polynomial time. It does not require anything for the "no" case! And even though HP is undecidable, there is a decidable way to verify a "yes": let the witness be "it halts in N steps", then run the program for that many steps and see if it halted by then. To prove HALT is not in NP, you have to show that this verification process grows faster than polynomially. It does (as busy beaver is uncomputable), but this all makes the example needlessly confusing.1 "What's bigger than a dog? THE MOON" Really (2) bothers me a lot more than (1) because it's just so inelegant. It suggests that NP-complete is the upper bound of "solvable" problems, and after that you're in full-on undecidability. I'd rather show intuitive problems that are harder than NP but not that much harder. But in looking for a "slightly harder" problem, I ran into an, ah, problem. It seems like the next-hardest class would be EXPTIME, except we don't know for sure that NP != EXPTIME. We know for sure that NP != NEXPTIME, but NEXPTIME doesn't have any intuitive, easily explainable problems. Most "definitely harder than NP" problems require a nontrivial background in theoretical computer science or mathematics to understand. There is one problem, though, that I find easily explainable. Place a token at the bottom left corner of a grid that extends infinitely up and right, call that point (0, 0). You're given list of valid displacement moves for the token, like (+1, +0), (-20, +13), (-5, -6), etc, and a target point like (700, 1). You may make any sequence of moves in any order, as long as no move ever puts the token off the grid. Does any sequence of moves bring you to the target? This is PSPACE-complete, I think, which still isn't proven to be harder than NP-complete (though it's widely believed). But what if you increase the number of dimensions of the grid? Past a certain number of dimensions the problem jumps to being EXPSPACE-complete, and then TOWER-complete (grows tetrationally), and then it keeps going. Some point might recognize this as looking a lot like the Ackermann function, and in fact this problem is ACKERMANN-complete on the number of available dimensions. A friend wrote a Quanta article about the whole mess, you should read it. This problem is ludicrously bigger than NP ("Chicago" instead of "The Moon"), but at least it's clearly decidable, easily explainable, and definitely not in NP. It's less confusing if you're taught the alternate (and original!) definition of NP, "the class of problems solvable in polynomial time by a nondeterministic Turing machine". Then HALT can't be in NP because otherwise runtime would be bounded by an exponential function. ↩

2 days ago 5 votes