Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
38
Loss and Distance For the past couple of months, my Facebook usage has started to diminish. In the past, I used to post quite a bit, and I dare say probably 10 years ago to the point of oversharing. It seems to me that the popularity of Facebook has been dropping in my network to […]
over a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Dan Quach Blog

A Data Engineering Perspective of LLMs

Data engineering is a field I would categorize as a subspecialty of software engineering. It shares the same concerns as software engineering—scalability, maintainability, and other “-ilities”—but its primary focus is on data. It’s a unique discipline because data is inherently messy, and as a result, no standard enterprise framework has emerged to dominate the space—and […]

a month ago 40 votes
Europe

These are a mix of stories from Norway, Croatia, and Slovenia from some past trips. Make Serbia Great Again Our first stop is the city of Dubrovnik in Croatia, which has become well known being the primary filming location for Game of Thrones seasons 2 – 8.  I watched the series, and am kind of […]

3 months ago 33 votes
State of Data Engineering 2025 Q1

AI Updates There is a lot of chatter about 2025 being the year of agentic frameworks.  To me, this means a system in which a subset can allow AI models to take independent actions based on their environment, typically interacting with external APIs or interfaces.  The terminology around this concept is still evolving, and definitions […]

3 months ago 28 votes
State of Data Engineering Q3 2024

Prompt Engineering – Meta Analysis Whitepaper One of my favorite AI podcasts, Latent Space, recently featured Sander Schulhoff, one of the authors of a comprehensive research paper on prompt engineering. This meta-study reviews over 1,600 published papers, with co-authors from OpenAI, Microsoft, and Stanford.  [podcast] https://www.latent.space/p/learn-prompting   [whitepaper] https://arxiv.org/abs/2406.06608   The whitepaper is an interesting academic deep dive into prompting, […]

6 months ago 63 votes
Braces

Growing up I had the same dentist from childhood to adulthood. My dentist’s office was run by Dentist Chung (in Vietnamese I called him Bác Sĩ Chung – which means Dr Chung translated directly) and his sister running the office. The office was in Garden Grove, in between the Korean and Vietnamese districts. Walking in […]

10 months ago 77 votes

More in programming

How Cursor Indexes Codebases Fast

Merkle Trees in the real world

18 hours ago 4 votes
a whippet waypoint

Hey peoples! Tonight, some meta-words. As you know I am fascinated by compilers and language implementations, and I just want to know all the things and implement all the fun stuff: intermediate representations, flow-sensitive source-to-source optimization passes, register allocation, instruction selection, garbage collection, all of that. It started long ago with a combination of curiosity and a hubris to satisfy that curiosity. The usual way to slake such a thirst is structured higher education followed by industry apprenticeship, but for whatever reason my path sent me through a nuclear engineering bachelor’s program instead of computer science, and continuing that path was so distasteful that I noped out all the way to rural Namibia for a couple years. Fast-forward, after 20 years in the programming industry, and having picked up some language implementation experience, a few years ago I returned to garbage collection. I have a good level of language implementation chops but never wrote a memory manager, and Guile’s performance was limited by its use of the Boehm collector. I had been on the lookout for something that could help, and when I learned of it seemed to me that the only thing missing was an appropriate implementation for Guile, and hey I could do that!Immix I started with the idea of an -style interface to a memory manager that was abstract enough to be implemented by a variety of different collection algorithms. This kind of abstraction is important, because in this domain it’s easy to convince oneself that a given algorithm is amazing, just based on vibes; to stay grounded, I find I always need to compare what I am doing to some fixed point of reference. This GC implementation effort grew into , but as it did so a funny thing happened: the as a direct replacement for the Boehm collector maintained mark bits in a side table, which I realized was a suitable substrate for Immix-inspired bump-pointer allocation into holes. I ended up building on that to develop an Immix collector, but without lines: instead each granule of allocation (16 bytes for a 64-bit system) is its own line.MMTkWhippetmark-sweep collector that I prototyped The is funny, because it defines itself as a new class of collector, fundamentally different from the three other fundamental algorithms (mark-sweep, mark-compact, and evacuation). Immix’s are blocks (64kB coarse-grained heap divisions) and lines (128B “fine-grained” divisions); the innovation (for me) is the discipline by which one can potentially defragment a block without a second pass over the heap, while also allowing for bump-pointer allocation. See the papers for the deets!Immix papermark-regionregionsoptimistic evacuation However what, really, are the regions referred to by ? If they are blocks, then the concept is trivial: everyone has a block-structured heap these days. If they are spans of lines, well, how does one choose a line size? As I understand it, Immix’s choice of 128 bytes was to be fine-grained enough to not lose too much space to fragmentation, while also being coarse enough to be eagerly swept during the GC pause.mark-region This constraint was odd, to me; all of the mark-sweep systems I have ever dealt with have had lazy or concurrent sweeping, so the lower bound on the line size to me had little meaning. Indeed, as one reads papers in this domain, it is hard to know the real from the rhetorical; the review process prizes novelty over nuance. Anyway. What if we cranked the precision dial to 16 instead, and had a line per granule? That was the process that led me to Nofl. It is a space in a collector that came from mark-sweep with a side table, but instead uses the side table for bump-pointer allocation. Or you could see it as an Immix whose line size is 16 bytes; it’s certainly easier to explain it that way, and that’s the tack I took in a .recent paper submission to ISMM’25 Wait what! I have a fine job in industry and a blog, why write a paper? Gosh I have meditated on this for a long time and the answers are very silly. Firstly, one of my language communities is Scheme, which was a research hotbed some 20-25 years ago, which means many practitioners—people I would be pleased to call peers—came up through the PhD factories and published many interesting results in academic venues. These are the folks I like to hang out with! This is also what academic conferences are, chances to shoot the shit with far-flung fellows. In Scheme this is fine, my work on Guile is enough to pay the intellectual cover charge, but I need more, and in the field of GC I am not a proven player. So I did an atypical thing, which is to cosplay at being an independent researcher without having first been a dependent researcher, and just solo-submit a paper. Kids: if you see yourself here, just go get a doctorate. It is not easy but I can only think it is a much more direct path to goal. And the result? Well, friends, it is this blog post :) I got the usual assortment of review feedback, from the very sympathetic to the less so, but ultimately people were confused by leading with a comparison to Immix but ending without an evaluation against Immix. This is fair and the paper does not mention that, you know, I don’t have an Immix lying around. To my eyes it was a good paper, an , but, you know, just a try. I’ll try again sometime.80% paper In the meantime, I am driving towards getting Whippet into Guile. I am hoping that sometime next week I will have excised all the uses of the BDW (Boehm GC) API in Guile, which will finally allow for testing Nofl in more than a laboratory environment. Onwards and upwards! whippet regions? paper??!?

2 days ago 5 votes
Stress And Programming

Having spent four decades as a programmer in various industries and situations, I know that modern software development processes are far more stressful than when I started. It's not simply that developing software today is more complex than it was back in 1981. In that early decade, none

2 days ago 5 votes
Espressif’s Automatic Reset

In previous articles, we saw how to use “real” UART, and looked into the trick used by Arduino to automatically reset boards when uploading firmware. Today, we’ll look into how Espressif does something similar, using even more tricks. “Real” UART on the Saola As usual, let’s first simply connect the UART adapter. Again, we connect … Continue reading Espressif’s Automatic Reset → The post Espressif’s Automatic Reset appeared first on Quentin Santos.

2 days ago 4 votes
How I built a chatbot with my dog

Lessons for AI prompting and retrieval

2 days ago 4 votes