Chartjunk: What I've learned about data visualization

125

from Tyler Cipriani: blog [alt+shift+b] in programming

For many people the first word that comes to mind when they think about statistical charts is “lie.” – Edward R. Tufte, The Visual Display of Quantitative Information I wish we could all agree: pie charts should die. I know this is unreasonable. And pie charts are only part of the problem. The problem is data visualizations that show what’s already obvious. After spending some time learning about data visualization, I’ve come to two important conclusions: Most charts are useless Good data visualizations are powerful tools to spark curiosity Creating a powerful chart is an underrated superpower for engineers. Here are some ideas that helped me learn how to do that. Why I loathe a good pie chart Take a look at this chart from the Wikimedia Foundation’s 2023–2024 budget projections: Wikimedia Foundation 2023–2024 Draft Budget This is a default Google Sheets chart—the first thing Sheets recommends for this data. What I’m able to glean from this chart: “Building analytics & ML Services”:...

a year ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Tyler Cipriani: blog

Boox Go 10.3, two months in

[The] Linux kernel uses GPLv2, and if you distribute GPLv2 code, you have to provide a copy of the source (and modifications) once someone asks for it. And now I’m asking nicely for you to do so 🙂 – Joga, bbs.onyx-international.com Boox in split screen, typewriter mode In January, I bought a Boox Go 10.3—a 10.3-inch, 300-ppi, e-ink Android tablet. After two months, I use the Boox daily—it’s replaced my planner, notebook, countless PDF print-offs, and the good parts of my phone. But Boox’s parent company, Onyx, is sketchy. I’m conflicted. The Boox Go is a beautiful, capable tablet that I use every day, but I recommend avoiding as long as Onyx continues to disregard the rights of its users. How I’m using my Boox My e-ink floor desk Each morning, I plop down in front of my MagicHold laptop stand and journal on my Boox with Obsidian. I use Syncthing to back up my planner and sync my Zotero library between my Boox and laptop. In the evening, I review my PDF planner and plot for tomorrow. I use these apps: Obsidian – a markdown editor that syncs between all my devices with no fuss for $8/mo. Syncthing – I love Syncthing—it’s an encrypted, continuous file sync-er without a centralized server. Meditation apps1 – Guided meditation away from the blue light glow of my phone or computer is better. Before buying the Boox, I considered a reMarkable. The reMarkable Paper Pro has a beautiful color screen with a frontlight, a nice pen, and a “type folio,” plus it’s certified by the Calm Tech Institute. But the reMarkable is a distraction-free e-ink tablet. Meanwhile, I need distraction-lite. What I like Calm(ish) technology – The Boox is an intentional device. Browsing the internet, reading emails, and watching videos is hard, but that’s good. Apps – Google Play works out of the box. I can install F-Droid and change my launcher without difficulty. Split screen – The built-in launcher has a split screen feature. I use it to open a PDF side-by-side with a notes doc. Reading – The screen is a 300ppi Carta 1200, making text crisp and clear. What I dislike I filmed myself typing at 240fps, each frame is 4.17ms. Boox’s typing latency is between 150ms and 275ms at the fastest refresh rate inside Obsidian. Typing – Typing latency is noticeable. At Boox’s highest refresh rate, after hitting a key, text takes between 150ms to 275ms to appear. I can still type, though it’s distracting at times. The horror of the default pen Accessories Pen – The default pen looks like a child’s whiteboard marker and feels cheap. I replaced it with the Kindle Scribe Premium pen, and the writing experience is vastly improved. Cover – It’s impossible to find a nice cover. I’m using a $15 cover that I’m encasing in stickers. Tool switching – Swapping between apps is slow and clunky. I blame Android and the current limitations of e-ink more than Boox. No frontlight – The Boox’s lack of frontlight prevents me from reading more with it. I knew this when I bought my Boox, but devices with frontlights seem to make other compromises. Onyx The Chinese company behind Boox, Onyx International, Inc., runs the servers where Boox shuttles tracking information. I block this traffic with Pi-Hole2. pihole-ing whatever telemetry Boox collects I inspected this traffic via Mitm proxy—most traffic was benign, though I never opted into sending any telemetry (nor am I logged in to a Boox account). But it’s also an Android device, so it’s feeding telemetry into Google’s gaping maw, too. Worse, Onyx is flouting the terms of the GNU Public License, declining to release Linux kernel modifications to users. This is anathema to me—GPL violations are tantamount to theft. Onyx’s disregard for user rights makes me regret buying the Boox. Verdict I’ll continue to use the Boox and feel bad about it. I hope my digging in this post will help the next person. Unfortunately, the e-ink tablet market is too niche to support the kind of solarpunk future I’d always imagined. But there’s an opportunity for an open, Linux-based tablet to dominate e-ink. Linux is playing catch-up on phones with PostmarketOS. Meanwhile, the best e-ink tablets have to offer are old, unupdateable versions of Android, like the OS on the Boox. In the future, I’d love to pay a license- and privacy-respecting company for beautiful, calm technology and recommend their product to everyone. But today is not the future. I go back and forth between “Waking Up” and “Calm”↩︎ Using github.com/JordanEJ/Onyx-Boox-Blocklist↩︎

2 months ago • 17 votes

Eventually consistent plain text accounting

.title { text-wrap: balance } Spending for October, generated by piping hledger → R Over the past six months, I’ve tracked my money with hledger—a plain text double-entry accounting system written in Haskell. It’s been surprisingly painless. My previous attempts to pick up real accounting tools floundered. Hosted tools are privacy nightmares, and my stint with GnuCash didn’t last. But after stumbling on Dmitry Astapov’s “Full-fledged hledger” wiki1, it clicked—eventually consistent accounting. Instead of modeling your money all at once, take it one hacking session at a time. It should be easy to work towards eventual consistency. […] I should be able to [add financial records] bit by little bit, leaving things half-done, and picking them up later with little (mental) effort. – Dmitry Astapov, Full-Fledged Hledger Principles of my system I’ve cobbled together a system based on these principles: Avoid manual entry – Avoid typing in each transaction. Instead, rely on CSVs from the bank. CSVs as truth – CSVs are the only things that matter. Everything else can be blown away and rebuilt anytime. Embrace version control – Keep everything under version control in Git for easy comparison and safe experimentation. Learn hledger in five minutes hledger concepts are heady, but its use is simple. I divide the core concepts into two categories: Stuff hledger cares about: Transactions – how hledger moves money between accounts. Journal files – files full of transactions Stuff I care about: Rules files – how I set up accounts, import CSVs, and move money between accounts. Reports – help me see where my money is going and if I messed up my rules. Transactions move money between accounts: 2024-01-01 Payday income:work $-100.00 assets:checking $100.00 This transaction shows that on Jan 1, 2024, money moved from income:work into assets:checking—Payday. The sum of each transaction should be $0. Money comes from somewhere, and the same amount goes somewhere else—double-entry accounting. This is powerful technology—it makes mistakes impossible to ignore. Journal files are text files containing one or more transactions: 2024-01-01 Payday income:work $-100.00 assets:checking $100.00 2024-01-02 QUANSHENG UVK5 assets:checking $-29.34 expenses:fun:radio $29.34 Rules files transform CSVs into journal files via regex matching. Here’s a CSV from my bank: Transaction Date,Description,Category,Type,Amount,Memo 09/01/2024,DEPOSIT Paycheck,Payment,Payment,1000.00, 09/04/2024,PizzaPals Pizza,Food & Drink,Sale,-42.31, 09/03/2024,Amazon.com*XXXXXXXXY,Shopping,Sale,-35.56, 09/03/2024,OBSIDIAN.MD,Shopping,Sale,-10.00, 09/02/2024,Amazon web services,Personal,Sale,-17.89, And here’s a checking.rules to transform that CSV into a journal file so I can use it with hledger: # checking.rules # -------------- # Map CSV fields → hledger fields[0] fields date,description,category,type,amount,memo,_ # `account1`: the account for the whole CSV.[1] account1 assets:checking account2 expenses:unknown skip 1 date-format %m/%d/%Y currency $ if %type Payment account2 income:unknown if %category Food & Drink account2 expenses:food:dining # [0]: <https://hledger.org/hledger.html#field-names> # [1]: <https://hledger.org/hledger.html#account-field> With these two files (checking.rules and 2024-09_checking.csv), I can make the CSV into a journal: $ > 2024-09_checking.journal \ hledger print \ --rules-file checking.rules \ -f 2024-09_checking.csv $ head 2024-09_checking.journal 2024-09-01 DEPOSIT Paycheck assets:checking $1000.00 income:unknown $-1000.00 2024-09-02 Amazon web services assets:checking $-17.89 expenses:unknown $17.89 Reports are interesting ways to view transactions between accounts. There are registers, balance sheets, and income statements: $ hledger incomestatement \ --depth=2 \ --file=2024-09_bank.journal Revenues: $1000.00 income:unknown ----------------------- $1000.00 Expenses: $42.31 expenses:food $63.45 expenses:unknown ----------------------- $105.76 ----------------------- Net: $894.24 At the beginning of September, I spent $105.76 and made $1000, leaving me with $894.24. But a good chunk is going to the default expense account, expenses:unknown. I can use the hleger aregister to see what those transactions are: $ hledger areg expenses:unknown \ --file=2024-09_checking.journal \ -O csv | \ csvcut -c description,change | \ csvlook | description | change | | ------------------------ | ------ | | OBSIDIAN.MD | 10.00 | | Amazon web services | 17.89 | | Amazon.com*XXXXXXXXY | 35.56 | l Then, I can add some more rules to my checking.rules: if OBSIDIAN.MD account2 expenses:personal:subscriptions if Amazon web services account2 expenses:personal:web:hosting if Amazon.com account2 expenses:personal:shopping:amazon Now, I can reprocess my data to get a better picture of my spending: $ > 2024-09_bank.journal \ hledger print \ --rules-file bank.rules \ -f 2024-09_bank.csv $ hledger bal expenses \ --depth=3 \ --percent \ -f 2024-09_checking2.journal 30.0 % expenses:food:dining 33.6 % expenses:personal:shopping 9.5 % expenses:personal:subscriptions 16.9 % expenses:personal:web -------------------- 100.0 % For the Amazon.com purchase, I lumped it into the expenses:personal:shopping account. But I could dig deeper—download my order history from Amazon and categorize that spending. This is the power of working bit-by-bit—the data guides you to the next, deeper rabbit hole. Goals and non-goals Why am I doing this? For years, I maintained a monthly spreadsheet of account balances. I had a balance sheet. But I still had questions. Spending over six months, generated by piping hledger → gnuplot Before diving into accounting software, these were my goals: Granular understanding of my spending – The big one. This is where my monthly spreadsheet fell short. I knew I had money in the bank—I kept my monthly balance sheet. I budgeted up-front the % of my income I was saving. But I had no idea where my other money was going. Data privacy – I’m unwilling to hand the keys to my accounts to YNAB or Mint. Increased value over time – The more time I put in, the more value I want to get out—this is what you get from professional tools built for nerds. While I wished for low-effort setup, I wanted the tool to be able to grow to more uses over time. Non-goals—these are the parts I never cared about: Investment tracking – For now, I left this out of scope. Between monthly balances in my spreadsheet and online investing tools’ ability to drill down, I was fine.2 Taxes – Folks smarter than me help me understand my yearly taxes.3 Shared system – I may want to share reports from this system, but no one will have to work in it except me. Cash – Cash transactions are unimportant to me. I withdraw money from the ATM sometimes. It evaporates. hledger can track all these things. My setup is flexible enough to support them someday. But that’s unimportant to me right now. Monthly maintenance I spend about an hour a month checking in on my money Which frees me to spend time making fancy charts—an activity I perversely enjoy. Income vs. Expense, generated by piping hledger → gnuplot Here’s my setup: $ tree ~/Documents/ledger . ├── export │ ├── 2024-balance-sheet.txt │ └── 2024-income-statement.txt ├── import │ ├── in │ │ ├── amazon │ │ │ └── order-history.csv │ │ ├── credit │ │ │ ├── 2024-01-01_2024-02-01.csv │ │ │ ├── ... │ │ │ └── 2024-10-01_2024-11-01.csv │ │ └── debit │ │ ├── 2024-01-01_2024-02-01.csv │ │ ├── ... │ │ └── 2024-10-01_2024-11-01.csv │ └── journal │ ├── amazon │ │ └── order-history.journal │ ├── credit │ │ ├── 2024-01-01_2024-02-01.journal │ │ ├── ... │ │ └── 2024-10-01_2024-11-01.journal │ └── debit │ ├── 2024-01-01_2024-02-01.journal │ ├── ... │ └── 2024-10-01_2024-11-01.journal ├── rules │ ├── amazon │ │ └── journal.rules │ ├── credit │ │ └── journal.rules │ ├── debit │ │ └── journal.rules │ └── common.rules ├── 2024.journal ├── Makefile └── README Process: Import – download a CSV for the month from each account and plop it into import/in/<account>/<dates>.csv Make – run make Squint – Look at git diff; if it looks good, git add . && git commit -m "💸" otherwise review hledger areg to see details. The Makefile generates everything under import/journal: journal files from my CSVs using their corresponding rules. reports in the export folder I include all the journal files in the 2024.journal with the line: include ./import/journal/*/*.journal Here’s the Makefile: SHELL := /bin/bash RAW_CSV = $(wildcard import/in/**/*.csv) JOURNALS = $(foreach file,$(RAW_CSV),$(subst /in/,/journal/,$(patsubst %.csv,%.journal,$(file)))) .PHONY: all all: $(JOURNALS) hledger is -f 2024.journal > export/2024-income-statement.txt hledger bs -f 2024.journal > export/2024-balance-sheet.txt .PHONY clean clean: rm -rf import/journal/**/*.journal import/journal/%.journal: import/in/%.csv @echo "Processing csv $< to $@" @echo "---" @mkdir -p $(shell dirname $@) @hledger print --rules-file rules/$(shell basename $$(dirname $<))/journal.rules -f "$<" > "$@" If I find anything amiss (e.g., if my balances are different than what the bank tells me), I look at hleger areg. I may tweak my rules or my CSVs and then I run make clean && make and try again. Simple, plain text accounting made simple. And if I ever want to dig deeper, hledger’s docs have more to teach. But for now, the balance of effort vs. reward is perfect. while reading a blog post from Jonathan Dowland↩︎ Note, this is covered by full-fledged hledger – Investements↩︎ Also covered in full-fledged hledger – Tax returns↩︎

6 months ago • 43 votes

Subliminal git commits

Luckily, I speak Leet. – Amita Ramanujan, Numb3rs, CBS’s IRC Drama There’s an episode of the CBS prime-time drama Numb3rs that plumbs the depths of Dr. Joel Fleischman’s1 knowledge of IRC. In one scene, Fleischman wonders, “What’s ‘leet’”? “Leet” is writing that replaces letters with numbers, e.g., “Numb3rs,” where 3 stands in for e. In short, leet is like the heavy-metal “S” you drew in middle school: Sweeeeet. / \ / | \ | | | \ \ | | | \ | / \ / ASCII art version of your misspent youth. Following years of keen observation, I’ve noticed Git commit hashes are also letters and numbers. Git commit hashes are, as Fleischman might say, prime targets for l33tification. What can I spell with a git commit? DenITDao via orlybooks) With hexidecimal we can spell any word containing the set of letters {A, B, C, D, E, F}—DEADBEEF (a classic) or ABBABABE (for Mama Mia aficionados). This is because hexidecimal is a base-16 numbering system—a single “digit” represents 16 numbers: Base-10: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 16 15 Base-16: 0 1 2 3 4 5 6 7 8 9 A B C D E F Leet expands our palette of words—using 0, 1, and 5 to represent O, I, and S, respectively. I created a script that scours a few word lists for valid words and phrases. With it, I found masterpieces like DADB0D (dad bod), BADA55 (bad ass), and 5ADBAB1E5 (sad babies). Manipulating commit hashes for fun and no profit Git commit hashes are no mystery. A commit hash is the SHA-1 of a commit object. And a commit object is the commit message with some metadata. $ mkdir /tmp/BADA55-git && cd /tmp/BAD55-git $ git init Initialized empty Git repository in /tmp/BADA55-git/.git/ $ echo '# BADA55 git repo' > README.md && git add README.md && git commit -m 'Initial commit' [main (root-commit) 68ec0dd] Initial commit 1 file changed, 1 insertion(+) create mode 100644 README.md $ git log --oneline 68ec0dd (HEAD -> main) Initial commit Let’s confirm we can recreate the commit hash: $ git cat-file -p 68ec0dd > commit-msg $ sha1sum <(cat \ <(printf "commit ") \ <(wc -c < commit-msg | tr -d '\n') \ <(printf '%b' '\0') commit-msg) 68ec0dd6dead532f18082b72beeb73bd828ee8fc /dev/fd/63 Our repo’s first commit has the hash 68ec0dd. My goal is: Make 68ec0dd be BADA55. Keep the commit message the same, visibly at least. But I’ll need to change the commit to change the hash. To keep those changes invisible in the output of git log, I’ll add a \t and see what happens to the hash. $ truncate -s -1 commit-msg # remove final newline $ printf '\t\n' >> commit-msg # Add a tab $ # Check the new SHA to see if it's BADA55 $ sha1sum <(cat \ <(printf "commit ") \ <(wc -c < commit-msg | tr -d '\n') \ <(printf '%b' '\0') commit-msg) 27b22ba5e1c837a34329891c15408208a944aa24 /dev/fd/63 Success! I changed the SHA-1. Now to do this until we get to BADA55. Fortunately, user not-an-aardvark created a tool for that—lucky-commit that manipulates a commit message, adding a combination of \t and [:space:] characters until you hit a desired SHA-1. Written in rust, lucky-commit computes all 256 unique 8-bit strings composed of only tabs and spaces. And then pads out commits up to 48-bits with those strings, using worker threads to quickly compute the SHA-12 of each commit. It’s pretty fast: $ time lucky_commit BADA555 real 0m0.091s user 0m0.653s sys 0m0.007s $ git log --oneline bada555 (HEAD -> main) Initial commit $ xxd -c1 <(git cat-file -p 68ec0dd) | grep -cPo ': (20|09)' 12 $ xxd -c1 <(git cat-file -p HEAD) | grep -cPo ': (20|09)' 111 Now we have an more than an initial commit. We have a BADA555 initial commit. All that’s left to do is to make ALL our commits BADA55 by abusing git hooks. $ cat > .git/hooks/post-commit && chmod +x .git/hooks/post-commit #!/usr/bin/env bash echo 'L337-ifying!' lucky_commit BADA55 $ echo 'A repo that is very l33t.' >> README.md && git commit -a -m 'l33t' L337-ifying! [main 0e00cb2] l33t 1 file changed, 1 insertion(+) $ git log --oneline bada552 (HEAD -> main) l33t bada555 Initial commit And now I have a git repo almost as cool as the sweet “S” I drew in middle school. This is a Northern Exposure spin off, right? I’ve only seen 1:48 of the show…↩︎ or SHA-256 for repos that have made the jump to a more secure hash function↩︎

7 months ago • 59 votes

The Pull Request

A brief and biased history. Oh yeah, there’s pull requests now – GitHub blog, Sat, 23 Feb 2008 When GitHub launched, it had no code review. Three years after launch, in 2011, GitHub user rtomayko became the first person to make a real code comment, which read, in full: “+1”. Before that, GitHub lacked any way to comment on code directly. Instead, pull requests were a combination of two simple features: Cross repository compare view – a feature they’d debuted in 2010—git diff in a web page. A comments section – a feature most blogs had in the 90s. There was no way to thread comments, and the comments were on a different page than the diff. GitHub pull requests circa 2010. This is from the official documentation on GitHub. Earlier still, when the pull request debuted, GitHub claimed only that pull requests were “a way to poke someone about code”—a way to direct message maintainers, but one that lacked any web view of the code whatsoever. For developers, it worked like this: Make a fork. Click “pull request”. Write a message in a text form. Send the message to someone1 with a link to your fork. Wait for them to reply. In effect, pull requests were a limited way to send emails to other GitHub users. Ten years after this humble beginning—seven years after the first code comment—when Microsoft acquired GitHub for $7.5 Billion, this cobbled-together system known as “GitHub flow” had become the default way to collaborate on code via Git. And I hate it. Pull requests were never designed. They emerged. But not from careful consideration of the needs of developers or maintainers. Pull requests work like they do because they were easy to build. In 2008, GitHub’s developers could have opted to use git format-patch instead of teaching the world to juggle branches. Or they might have chosen to generate pull requests using the git request-pull command that’s existed in Git since 2005 and is still used by the Linux kernel maintainers today2. Instead, they shrugged into GitHub flow, and that flow taught the world to use Git. And commit histories have sucked ever since. For some reason, github has attracted people who have zero taste, don’t care about commit logs, and can’t be bothered. – Linus Torvalds, 2012 “Someone” was a person chosen by you from a checklist of the people who had also forked this repository at some point.↩︎ Though to make small, contained changes you’d use git format-patch and git am.↩︎

8 months ago • 75 votes

Git the stupid password store

.title {text-wrap:balance;} GIT - the stupid content tracker “git” can mean anything, depending on your mood. – Linus Torvalds, Initial revision of “git”, the information manager from hell Like most git features, gitcredentials(7) are obscure, byzantine, and incredibly useful. And, for me, they’re a nice, hacky solution to a simple problem. Problem: Home directories teeming with tokens. Too many programs store cleartext credentials in config files in my home directory, making exfiltration all too easy. Solution: For programs I write, I can use git credential fill – the password library I never knew I installed. #!/usr/bin/env bash input="\ protocol=https host=example.com user=thcipriani " eval "$(echo "$input" | git credential fill)" echo "The password is: $password" Which looks like this when you run it: $ ./prompt.sh Password for 'https://thcipriani@example.com': The password is: hunter2 What did git credentials fill do? Accepted a protocol, username, and host on standard input. Called out to my git credential helper My credential helper checked for credentials matching https://thcipriani@example.com and found nothing Since my credential helper came up empty, it prompted me for my password Finally, it echoed <key>=<value>\n pairs for the keys protocol, host, username, and password to standard output. If I want, I can tell my credential helper to store the information I entered: git credential approve <<EOF protocol=$protocol username=$username host=$host password=$password EOF If I do that, the next time I run the script, it finds the password without prompting: $ ./prompt.sh The password is: hunter2 What are git credentials? Surprisingly, the intended purpose of git credentials is NOT “a weird way to prompt for passwords.” The problem git credentials solve is this: With git over ssh, you use your keys. With git over https, you type a password. Over and over and over. Beleaguered git maintainers solved this dilemma with the credential storage system—git credentials. With the right configuration, git will stop asking for your password when you push to an https remote. Instead, git credentials retrieve and send auth info to remotes. On the labyrinthine options of git credentials My mind initially refused to learn git credentials due to its twisty maze of terms that all sound alike: git credential fill: how you invoke a user’s configured git credential helper git credential approve: how you save git credentials (if this is supported by the user’s git credential helper) git credential.helper: the git config that points to a script that poops out usernames and passwords. These helper scripts are often named git-credential-<something>. git-credential-cache: a specific, built-in git credential helper that caches credentials in memory for a while. git-credential-store: STOP. DON’T TOUCH. This is a specific, built-in git credential helper that stores credentials in cleartext in your home directory. Whomp whomp. git-credential-manager: a specific and confusingly named git credential helper from Microsoft®. If you’re on Linux or Mac, feel free to ignore it. But once I mapped the terms, I only needed to pick a git credential helper. Configuring good credential helpers The built-in git-credential-store is a bad credential helper—it saves your passwords in cleartext in ~/.git-credentials.1 If you’re on a Mac, you’re in luck2—one command points git credentials to your keychain: git config --global credential.helper osxkeychain Third-party developers have contributed helpers for popular password stores: 1Password pass: the standard Unix password manager OAuth Git’s documentation contains a list of credential-helpers, too Meanwhile, Linux and Windows have standard options. Git’s source repo includes helpers for these options in the contrib directory. On Linux, you can use libsecret. Here’s how I configured it on Debian: sudo apt install libsecret-1-0 libsecret-1-dev cd /usr/share/doc/git/contrib/credential/libsecret/ sudo make sudo mv git-credential-libsecret /usr/local/bin/ git config --global credential.helper libsecret On Windows, you can use the confusingly named git credential manager. I have no idea how to do this, and I refuse to learn. Now, if you clone a repo over https, you can push over https without pain3. Plus, you have a handy trick for shell scripts. git-credential-store is not a git credential helper of honor. No highly-esteemed passwords should be stored with it. This message is a warning about danger. The danger is still present, in your time, as it was in ours.↩︎ I think. I only have Linux computers to test this on, sorry ;_;↩︎ Or the config option pushInsteadOf, which is what I actually do.↩︎

9 months ago • 59 votes

More in programming

How Cursor Indexes Codebases Fast

Merkle Trees in the real world

16 hours ago • 4 votes

a whippet waypoint

Hey peoples! Tonight, some meta-words. As you know I am fascinated by compilers and language implementations, and I just want to know all the things and implement all the fun stuff: intermediate representations, flow-sensitive source-to-source optimization passes, register allocation, instruction selection, garbage collection, all of that. It started long ago with a combination of curiosity and a hubris to satisfy that curiosity. The usual way to slake such a thirst is structured higher education followed by industry apprenticeship, but for whatever reason my path sent me through a nuclear engineering bachelor’s program instead of computer science, and continuing that path was so distasteful that I noped out all the way to rural Namibia for a couple years. Fast-forward, after 20 years in the programming industry, and having picked up some language implementation experience, a few years ago I returned to garbage collection. I have a good level of language implementation chops but never wrote a memory manager, and Guile’s performance was limited by its use of the Boehm collector. I had been on the lookout for something that could help, and when I learned of it seemed to me that the only thing missing was an appropriate implementation for Guile, and hey I could do that!Immix I started with the idea of an -style interface to a memory manager that was abstract enough to be implemented by a variety of different collection algorithms. This kind of abstraction is important, because in this domain it’s easy to convince oneself that a given algorithm is amazing, just based on vibes; to stay grounded, I find I always need to compare what I am doing to some fixed point of reference. This GC implementation effort grew into , but as it did so a funny thing happened: the as a direct replacement for the Boehm collector maintained mark bits in a side table, which I realized was a suitable substrate for Immix-inspired bump-pointer allocation into holes. I ended up building on that to develop an Immix collector, but without lines: instead each granule of allocation (16 bytes for a 64-bit system) is its own line.MMTkWhippetmark-sweep collector that I prototyped The is funny, because it defines itself as a new class of collector, fundamentally different from the three other fundamental algorithms (mark-sweep, mark-compact, and evacuation). Immix’s are blocks (64kB coarse-grained heap divisions) and lines (128B “fine-grained” divisions); the innovation (for me) is the discipline by which one can potentially defragment a block without a second pass over the heap, while also allowing for bump-pointer allocation. See the papers for the deets!Immix papermark-regionregionsoptimistic evacuation However what, really, are the regions referred to by ? If they are blocks, then the concept is trivial: everyone has a block-structured heap these days. If they are spans of lines, well, how does one choose a line size? As I understand it, Immix’s choice of 128 bytes was to be fine-grained enough to not lose too much space to fragmentation, while also being coarse enough to be eagerly swept during the GC pause.mark-region This constraint was odd, to me; all of the mark-sweep systems I have ever dealt with have had lazy or concurrent sweeping, so the lower bound on the line size to me had little meaning. Indeed, as one reads papers in this domain, it is hard to know the real from the rhetorical; the review process prizes novelty over nuance. Anyway. What if we cranked the precision dial to 16 instead, and had a line per granule? That was the process that led me to Nofl. It is a space in a collector that came from mark-sweep with a side table, but instead uses the side table for bump-pointer allocation. Or you could see it as an Immix whose line size is 16 bytes; it’s certainly easier to explain it that way, and that’s the tack I took in a .recent paper submission to ISMM’25 Wait what! I have a fine job in industry and a blog, why write a paper? Gosh I have meditated on this for a long time and the answers are very silly. Firstly, one of my language communities is Scheme, which was a research hotbed some 20-25 years ago, which means many practitioners—people I would be pleased to call peers—came up through the PhD factories and published many interesting results in academic venues. These are the folks I like to hang out with! This is also what academic conferences are, chances to shoot the shit with far-flung fellows. In Scheme this is fine, my work on Guile is enough to pay the intellectual cover charge, but I need more, and in the field of GC I am not a proven player. So I did an atypical thing, which is to cosplay at being an independent researcher without having first been a dependent researcher, and just solo-submit a paper. Kids: if you see yourself here, just go get a doctorate. It is not easy but I can only think it is a much more direct path to goal. And the result? Well, friends, it is this blog post :) I got the usual assortment of review feedback, from the very sympathetic to the less so, but ultimately people were confused by leading with a comparison to Immix but ending without an evaluation against Immix. This is fair and the paper does not mention that, you know, I don’t have an Immix lying around. To my eyes it was a good paper, an , but, you know, just a try. I’ll try again sometime.80% paper In the meantime, I am driving towards getting Whippet into Guile. I am hoping that sometime next week I will have excised all the uses of the BDW (Boehm GC) API in Guile, which will finally allow for testing Nofl in more than a laboratory environment. Onwards and upwards! whippet regions? paper??!?

2 days ago • 4 votes

Stress And Programming

Having spent four decades as a programmer in various industries and situations, I know that modern software development processes are far more stressful than when I started. It's not simply that developing software today is more complex than it was back in 1981. In that early decade, none

2 days ago • 5 votes

Espressif’s Automatic Reset

In previous articles, we saw how to use “real” UART, and looked into the trick used by Arduino to automatically reset boards when uploading firmware. Today, we’ll look into how Espressif does something similar, using even more tricks. “Real” UART on the Saola As usual, let’s first simply connect the UART adapter. Again, we connect … Continue reading Espressif’s Automatic Reset → The post Espressif’s Automatic Reset appeared first on Quentin Santos.

2 days ago • 4 votes

How I built a chatbot with my dog

Lessons for AI prompting and retrieval

2 days ago • 4 votes

New here?