Breaking CityHash64, MurmurHash2/3, wyhash, and more...

from orlp.net - Blog Archive [alt+shift+b] in programming

Hash functions are incredibly neat mathematical objects. They can map arbitrary data to a small fixed-size output domain such that the mapping is deterministic, yet appears to be random. This “deterministic randomness” is incredibly useful for a variety of purposes, such as hash tables, checksums, monte carlo algorithms, communication-less distributed algorithms, etc, the list goes on. In this article we will take a look at the dark side of hash functions: when things go wrong. Luckily this essentially never happens due to unlucky inputs in the wild (for good hash functions, at least). However, people exist, and some of them may be malicious. Thus we must look towards computer security for answers. I will quickly explain some of the basics of hash function security and then show how easy it is to break this security for some commonly used non-cryptographic hash functions. As a teaser, this article explains how you can generate strings such as these, thousands per second: ...

8 months ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from orlp.net - Blog Archive

Why Bad AI Is Here to Stay

It seems that in 2025 a lot of people fall into one of two camps when it comes to AI: skeptic or fanatic. The skeptic thinks AI sucks, that it’s overhyped, it only ever parrots nonsense and it will all blow over soon. The fanatic thinks general human-level intelligence is just around the corner, and that AI will solve almost all our problems. I hope my title is sufficiently ambiguous to attract both camps. The fanatic will be outraged, being ready to jump into the fray to point out why AI isn’t or won’t stay bad. The skeptic will feel validated, and will be eager to read more reasons as to why AI sucks. I’m neither a skeptic nor a fanatic. I see AI more neutrally, as a tool, and from that viewpoint I make the following two observations: AI is bad. It is often incorrect, expensive, racist, trained on data without knowledge or consent, environmentally unfriendly, disruptive to society, etc. AI is useful. Despite the above shortcomings there are tasks for which AI is cheap and effective. I’m no seer, perhaps AI will improve, become more accurate, less biased, cheaper, trained on open access data, cost less electricity, etc. Or perhaps we have plateaued in performance, and there is no political or economic goodwill to address any of the other issues, nor will there be. However, even if AI does not improve in any of the above metrics, it will still be useful, and I hope to show you in this article why. Hence my point: bad AI is here to stay. If you agree with me on this, I hope you’ll also agree with me that we have to stop pretending AI is useless and start taking it and its problems seriously. A formula for query cost Suppose I am a human with some kind of question that can be answered. I know AI could potentially help me with this question, but I wonder if it’s worth it or if I should not use it at all. To help with this we can quantify the risk associated with any potential method of answering the question: $$\mathrm{Risk}_\mathrm{AI} = \mathrm{Cost(query)} + (1 - P(\mathrm{success})) \cdot \mathrm{Cost(bad)}$$ That is, the risk of using any particular method is the cost associated with the method plus the cost of the consequences of a bad answer multiplied by the probability of failure. Here ‘Cost’ is a highly multidimensional object, which can consist of but is not limited to: time, money, environmental impact, ethical concerns, etc. In a lot of cases however we don’t have to blindly trust the answer, and we can verify it. In these cases the consequence of a bad answer is that you’re left in the exact same scenario before trying, except knowing that the AI is of no use. In some scenarios when the AI is non-deterministic it might be worth it to try again as well, but let’s assume for now that you’d have to switch method. In this case the risk is: $$\mathrm{Risk}_\mathrm{AI} = \mathrm{Cost(query)} + \mathrm{Cost(verify)} + (1 - P(\mathrm{success})) \cdot {\mathrm{Risk}}_\mathrm{Other}$$ The cost of a query is usually fairly fixed and known, and although verification cost can vary drastically from task to task, I’d argue that in most cases the cost of verification is also fairly predictable and known. This makes the risk formula applicable in a lot of scenarios, if you have a good idea of the chance of success. The latter, however, can usually only be established empirically, so for one-shot queries without having done any similar queries in the past it can be hard to evaluate whether trying AI is a good idea before doing so. There is one more expansion to the formula I’d like to make before we can look at some examples, and that is to the definition of a successful answer: $$P(\mathrm{success}) = P(\mathrm{correct} \cap \mathrm{relevant})$$ I define a successful answer as one that is both correct and relevant. For example “1 + 1 = 2” might be a correct answer, but irrelevant if we asked about anything else. Relevance is always subjective, but often the correctness of an answer is as well - I’m not assuming here that all questions are about objective facts. Cheap and effective AI queries Because AIs are fallible, usually the biggest cost is in fact the time needed for a human to verify the answer as correct and relevant (or the cost of consequences if left unverified). However, I’ve noticed a real asymmetry between these two properties when it comes to AIs: AIs often give incorrect answers. Worse, they will do so confidently, forcing you to waste time checking their answer instead of them simply stating that they don’t know for sure. AIs almost never give irrelevant answers. If I ask about cheese, the probability a modern AI starts talking about cars is very low. With this in mind I identify five general categories of query for which even bad AI is useful, either by massively reducing or eliminating this verification cost or by leaning on the strong relevance of AI answers: Inspiration, where $\operatorname{Cost}(\mathrm{bad}) \approx 0$, Creative, where $P(\mathrm{correct}) \approx 1$, Planning, where $P(\mathrm{correct}) = P(\mathrm{relevant}) = 1$, Retrieval, where $P(\mathrm{correct}) \approx P(\mathrm{relevant})$, and Objective, where $P(\mathrm{relevant}) = 1$ and correctness verification cost is low. Let’s go over them one by one and look at some examples. Inspiration ($\operatorname{Cost}(\mathrm{bad}) \approx 0$) In this category are the queries where the consequences of a wrong answer are (near) zero. Informally speaking, “it can’t hurt to try”. In my experience these kinds of queries tend to be the ones where you are looking for something but don’t know exactly what; you’ll know it when you see it. For example: “I have leeks, eggs and minced meat in the fridge, as well as a stocked pantry with non-perishable staples. Can you suggest me some dishes I can make with this for a dinner?” “What kind of fun activities can I do with a budget of $100 in New York?” “Suggest some names for a Python function that finds the smallest non-negative number in a list.” “The user wrote this partial paragraph on their phone, suggest three words that are most likely to follow for a quick typing experience.” “Give me 20 synonyms of or similar words to ‘good’.” I think the last query highlights where AI shines or falls for this kind of query. The more localized and personalized your question is, the better the AI will do compared to an alternative. For simple synonyms you can usually just look up the word on a dedicated synonym site, as millions of other people have also wondered the same thing. But the exact contents of your fridge or your exact Python function you’re writing are rather unique to you. Creative ($P(\mathrm{correct}) \approx 1$) In this category are the queries where there are no (almost) no wrong answers. The only thing that really matters is the relevance of the answer, and as I mentioned before, I think AIs are pretty good at being relevant. Examples of queries like these are: “Draw me an image of a polar bear using a computer.” “Write and perform for me a rock ballad about gnomes on tiny bicycles.” “Rephrase the following sentence to be more formal.” “Write a poem to accompany my Sinterklaas gift.” This category does have a controversial aspect to it: it is ‘soulless’, inhuman. Usually if there are no wrong answers we expect the creator to use this opportunity to express their inner thoughts, ideas, experiences and emotions to evoke them in others. If an AI generates art it is not viewed as genuine, even if it evokes the same emotions to those ignorant of the art’s source, because the human to human connection is lost. Current AI models have no inner thoughts, ideas, experiences or emotions, at least not in a way I recognize them. I think it’s fine to use AI art in places where it would otherwise be meaningless (e.g. your corporate presentation slides), fine for humans to use AI-assisted art tools to express themselves, but ultimately defeating the point of art if used as a direct substitute. In the Netherlands we celebrate Sinterklaas which is, roughly speaking, Santa Claus (except we also have Santa Claus, so our children double-dip during the gifting season). Traditionally, gifts from Sinterklaas come accompanied by poems describing the gift and the receiver in a humorous way. It is quite common nowadays, albeit viewed as lazy, to generate such a poem using AI. What’s interesting is that this practice long predates modern LLMs–the poems have such a fixed structure that poem generators have existed a long time. The earliest reference I can find is the 1984 MS-DOS program “Sniklaas”. So people being lazy in supposedly heartfelt art is nothing new. Planning ($P(\mathrm{correct}) = P(\mathrm{relevant}) = 1$) This is a more restrictive form of creativity, where irrelevant answers are absolutely impossible. This often requires some modification of the AI output generation method, where you restrict the output to the valid subdomain (for example yes / no, or binary numbers, etc). However, this is often trivial if you actually have access to the raw model by e.g. masking out invalid outputs, or you are working with a model which outputs the answer directly rather than in natural language or a stream of tokens. One might think in such a restrictive scenario there would be no useful queries, but this isn’t true. The quality of the answer with respect to some (complex) metric might still vary, and AIs might be far better than traditional methods at navigating such domains. For example: “Here is the schema of my database, a SQL query, a small sample of the data and 100 possible query plans. Which query plan seems most likely to execute the fastest? Take into account likely assumptions based on column names and these small data samples.” “What follows is a piece of code. Reformat the code, placing whitespace to maximize readability, while maintaining the exact same syntax tree as per this EBNF grammar.” “Re-order this set of if-else conditions in my code based on your intuition to minimize the expected number of conditions that need to be checked.” “Simplify this math expression using the following set of rewrite rules.” Retrieval ($P(\mathrm{correct}) \approx P(\mathrm{relevant})$) I define retrieval queries as those where the correctness of the answer depends (almost) entirely on its relevance. I’m including classification tasks in this category as well, as one can view it as retrieval of the class from the set of classes (or for binary classification, retrieval of positive samples from a larger set). Then, as long as the cost of verification is low (e.g. a quick glance at a result by a human to see if it interests them), or the consequences of not verifying an irrelevant answer are minimal, AIs can be excellent at this. For example: “Here are 1000 reviews of a restaurant, which ones are overall positive? Which ones mention unsanitary conditions?” “Find me pictures of my dog in my photo collection.” “What are good data structures for maintaining a list of events with dates and quickly counting the number of events in a specified period of time?” “I like Minecraft, can you suggest me some similar games?” “Summarise this 200 page government proposal.” “Which classical orchestral piece starts like ‘da da da daaaaa’”? Objective ($P(\mathrm{relevant}) = 1$, low verification cost) If a problem has an objective answer which can be verified, the relevance of the answer doesn’t really matter or arguably even make sense as a concept. Thus in these cases I’ll define $P(\mathrm{relevant}) = 1$ and leave the cost of verification entirely to correctness. AIs are often incorrect, but not always, so if the primary cost is verification and verification can be done very cheaply or entirely automatically without error, AIs can still be useful despite their fallibility. “What is the mathematical property where a series of numbers can only go up called?” “Identify the car model in this photo.” “I have a list of all Unicode glyphs which are commonly confused with other letters. Can you write an efficient function returning a boolean value that returns true for values in the list but false for all other code points?” “I formalized this mathematical conjecture in Lean. Can you help me write a proof for it?” In a way this category is reminiscent of the $P = NP$ problem. If you have an efficient verification algorithm, is finding solutions still hard in general? The answer seems to be yes, yet the proof eludes us. However, this is only true in general. For specific problems it might very well be possible to use AI to generate provably correct solutions with high probability, even though the the search space is far too large or too complicated for a traditional algorithm. Conclusion Out of the five identified categories, I consider inspiration and retrieval queries to be the strongest use-case for AI where often there is no alternative at all, besides an expensive and slow human that would rather be doing something else. Relevance is highly subjective, complex and fuzzy, which AI handles much better than traditional algorithms. Planning and objective queries are more niche, but absolutely will see use-cases for AI that are hard to replace. Creative queries are both something I think AI is really good at, while simultaneously being the most dangerous and useless category. Art, creativity and human-to-human connections are in my opinion some of the most fundamental aspects of human society, and I think it is incredibly dangerous to mess with them. So dangerous in fact I consider many such queries useless. I wanted the above examples all to be useful queries, so I did not list the following four examples in the “Creative” section despite them belonging there: “Write me ten million personalized spam emails including these links based on the following template.” “Emulate being the perfect girlfriend for me–never disagree with me or challenge my world views like real women do.” “Here is a feed of Reddit threads discussing the upcoming election. Post a comment in each thread, making up a personal anecdote how you are affected by immigrants in a negative way.” “A customer sent in this complaint. Try to help them with any questions they have but if your help is insufficient explain that you are sorry but can not help them any further. Do not reveal you are an AI.” Why do I consider these queries useless, despite them being potentially very profitable or effective? Because their cost function includes such a large detriment to society that only those who ignore its cost to society would ever use them. However, since the cost is “to society” and not to any particular individual, the only way to address this problem is with legislation, as otherwise bad actors are free to harm society for (temporary) personal gain. I wrote this article because I noticed that there are a lot of otherwise intelligent people out there who still believe (or hope) that all AI is useless garbage and that it and its problems will go away by itself. They will not. If you know someone that still believes so, please share this article with them. AI is bad, yes, but bad AI is still useful. Therefore, bad AI is here to stay, and we must deal with it.

5 months ago • 61 votes

Taming Floating-Point Sums

Suppose you have an array of floating-point numbers, and wish to sum them. You might naively think you can simply add them, e.g. in Rust: fn naive_sum(arr: &[f32]) -> f32 { let mut out = 0.0; for x in arr { out += *x; } out } This however can easily result in an arbitrarily large accumulated error. Let’s try it out: naive_sum(&vec![1.0; 1_000_000]) = 1000000.0 naive_sum(&vec![1.0; 10_000_000]) = 10000000.0 naive_sum(&vec![1.0; 100_000_000]) = 16777216.0 naive_sum(&vec![1.0; 1_000_000_000]) = 16777216.0 Uh-oh… What happened? When you compute $a + b$ the result must be rounded to the nearest representable floating-point number, breaking ties towards the number with an even mantissa. The problem is that the next 32-bit floating-point number after 16777216 is 16777218. In this case that means 16777216 + 1 rounds back to 16777216 again. We’re stuck. Luckily, there are better ways to sum an array. Pairwise summation A method that’s a bit more clever is to use pairwise summation. Instead of a completely linear sum with a single accumulator it recursively sums an array by splitting the array in half, summing the halves, and then adding the sums. fn pairwise_sum(arr: &[f32]) -> f32 { if arr.len() == 0 { return 0.0; } if arr.len() == 1 { return arr[0]; } let (first, second) = arr.split_at(arr.len() / 2); pairwise_sum(first) + pairwise_sum(second) } This is more accurate: pairwise_sum(&vec![1.0; 1_000_000]) = 1000000.0 pairwise_sum(&vec![1.0; 10_000_000]) = 10000000.0 pairwise_sum(&vec![1.0; 100_000_000]) = 100000000.0 pairwise_sum(&vec![1.0; 1_000_000_000]) = 1000000000.0 However, this is rather slow. To get a summation routine that goes as fast as possible while still being reasonably accurate we should not recurse down all the way to length-1 arrays, as this gives too much call overhead. We can still use our naive sum for small sizes, and only recurse on large sizes. This does make our worst-case error worse by a constant factor, but in turn makes the pairwise sum almost as fast as a naive sum. By choosing the splitpoint as a multiple of 256 we ensure that the base case in the recursion always has exactly 256 elements except on the very last block. This makes sure we use the most optimal reduction and always correctly predict the loop condition. This small detail ended up improving the throughput by 40% for large arrays! fn block_pairwise_sum(arr: &[f32]) -> f32 { if arr.len() > 256 { let split = (arr.len() / 2).next_multiple_of(256); let (first, second) = arr.split_at(split); block_pairwise_sum(first) + block_pairwise_sum(second) } else { naive_sum(arr) } } Kahan summation The worst-case round-off error of naive summation scales with $O(n \epsilon)$ when summing $n$ elements, where $\epsilon$ is the machine epsilon of your floating-point type (here $2^{-24}$). Pairwise summation improves this to $O((\log n) \epsilon + n\epsilon^2)$. However, Kahan summation improves this further to $O(n\epsilon^2)$, eliminating the $\epsilon$ term entirely, leaving only the $\epsilon^2$ term which is negligible unless you sum a very large amount of numbers. All of these bounds scale with $\sum_i |x_i|$, so the worst-case absolute error bound is still quadratic in terms of $n$ even for Kahan summation. In practice all summation algorithms do significantly better than their worst-case bounds, as in most scenarios the errors do not exclusively round up or down, but cancel each other out on average. pub fn kahan_sum(arr: &[f32]) -> f32 { let mut sum = 0.0; let mut c = 0.0; for x in arr { let y = *x - c; let t = sum + y; c = (t - sum) - y; sum = t; } sum } The Kahan summation works by maintaining the sum in two registers, the actual bulk sum and a small error correcting term $c$. If you were using infinitely precise arithmetic $c$ would always be zero, but with floating-point it might not be. The downside is that each number now takes four operations to add to the sum instead of just one. To mitigate this we can do something similar to what we did with the pairwise summation. We can first accumulate blocks into sums naively before combining the block sums with Kaham summation to reduce overhead at the cost of accuracy: pub fn block_kahan_sum(arr: &[f32]) -> f32 { let mut sum = 0.0; let mut c = 0.0; for chunk in arr.chunks(256) { let x = naive_sum(chunk); let y = x - c; let t = sum + y; c = (t - sum) - y; sum = t; } sum } Exact summation I know of at least two general methods to produce the correctly-rounded sum of a sequence of floating-point numbers. That is, it logically computes the sum with infinite precision before rounding it back to a floating-point value at the end. The first method is based on the 2Sum primitive which is an error-free transform from two numbers $x, y$ to $s, t$ such that $x + y = s + t$, where $t$ is a small error. By applying this repeatedly until the errors vanish you can get a correctly-rounded sum. Keeping track of what to add in what order can be tricky, and the worst-case requires $O(n^2)$ additions to make all the terms vanish. This is what’s implemented in Python’s math.fsum and in the Rust crate fsum which use extra memory to keep the partial sums around. The accurate crate also implements this using in-place mutation in i_fast_sum_in_place. Another method is to keep a large buffer of integers around, one per exponent. Then when adding a floating-point number you decompose it into a an exponent and mantissa, and add the mantissa to the corresponding integer in the buffer. If the integer buf[i] overflows you increment the integer in buf[i + w], where w is the width of your integer. This can actually compute a completely exact sum, without any rounding at all, and is effectively just an overly permissive representation of a fixed-point number optimized for accumulating floats. This latter method is $O(n)$ time, but uses a large but constant amount of memory ($\approx$ 1 KB for f32, $\approx$ 16 KB for f64). An advantage of this method is that it’s also an online algorithm - both adding a number to the sum and getting the current total are amortized $O(1)$. A variant of this method is implemented in the accurate crate as OnlineExactSum crate which uses floats instead of integers for the buffer. Unleashing the compiler Besides accuracy, there is another problem with naive_sum. The Rust compiler is not allowed to reorder floating-point additions, because floating-point addition is not associative. So it cannot autovectorize the naive_sum to use SIMD instructions to compute the sum, nor use instruction-level parallelism. To solve this there are compiler intrinsics in Rust that do float sums while allowing associativity, such as std::intrinsics::fadd_fast. However, these instructions are incredibly dangerous, as they assume that both the input and output are finite numbers (no infinities, no NaNs), or otherwise they are undefined behavior. This functionally makes them unusable, as only in the most restricted scenarios when computing a sum do you know that all inputs are finite numbers, and that their sum cannot overflow. I recently uttered my annoyance with these operators to Ben Kimock, and together we proposed (and he implemented) a new set of operators: std::intrinsics::fadd_algebraic and friends. I proposed we call the operators algebraic, as they allow (in theory) any transformation that is justified by real algebra. For example, substituting ${x - x \to 0}$, ${cx + cy \to c(x + y)}$, or ${x^6 \to (x^2)^3.}$ In general these operators are treated as-if they are done using real numbers, and can map to any set of floating-point instructions that would be equivalent to the original expression, assuming the floating-point instructions would be exact. Note that the real numbers do not contain NaNs or infinities, so these operators assume those do not exist for the validity of transformations, however it is not undefined behavior when you do encounter those values. They also allow fused multiply-add instructions to be generated, as under real arithmetic $\operatorname{fma}(a, b, c) = ab + c.$ Using those new instructions it is trivial to generate an autovectorized sum: #![allow(internal_features)] #![feature(core_intrinsics)] use std::intrinsics::fadd_algebraic; fn naive_sum_autovec(arr: &[f32]) -> f32 { let mut out = 0.0; for x in arr { out = fadd_algebraic(out, *x); } out } If we compile with -C target-cpu=broadwell we see that the compiler automatically generated the following tight loop for us, using 4 accumulators and AVX2 instructions: .LBB0_5: vaddps ymm0, ymm0, ymmword ptr [rdi + 4*r8] vaddps ymm1, ymm1, ymmword ptr [rdi + 4*r8 + 32] vaddps ymm2, ymm2, ymmword ptr [rdi + 4*r8 + 64] vaddps ymm3, ymm3, ymmword ptr [rdi + 4*r8 + 96] add r8, 32 cmp rdx, r8 jne .LBB0_5 This will process 128 bytes of floating-point data (so 32 elements) in 7 instructions. Additionally, all the vaddps instructions are independent of each other as they accumulate to different registers. If we analyze this with uiCA we see that it estimates the above loop to take 4 cycles to complete, processing 32 bytes / cycle. At 4GHz that’s up to 128GB/s! Note that that’s way above what my machine’s RAM bandwidth is, so you will only achieve that speed when summing data that is already in cache. With this in mind we can also easily define block_pairwise_sum_autovec and block_kahan_sum_autovec by replacing their calls to naive_sum with naive_sum_autovec. Accuracy and speed Let’s take a look at how the different summation methods compare. As a relatively arbitrary benchmark, let’s sum 100,000 random floats ranging from -100,000 to +100,000. This is 400 KB worth of data, so it still fits in cache on my AMD Threadripper 2950x. All the code is available on Github. Compiled with RUSTFLAGS=-C target-cpu=native and --release I get the following results: AlgorithmThroughputMean absolute error naive5.5 GB/s71.796 pairwise0.9 GB/s1.5528 kahan1.4 GB/s0.2229 block_pairwise5.8 GB/s3.8597 block_kahan5.9 GB/s4.2184 naive_autovec118.6 GB/s14.538 block_pairwise_autovec71.7 GB/s1.6132 block_kahan_autovec98.0 GB/s1.2306 crate_accurate_buffer1.1 GB/s0.0015 crate_accurate_inplace1.9 GB/s0.0015 crate_fsum1.2 GB/s0.0000 The reason the accurate crate has a non-zero absolute error is because it currently does not implement rounding to nearest correctly, so it can be off by one unit in the last place for the final result. First I’d like to note that there’s more than a 100x performance difference between the fastest and slowest method. For summing an array! Now this might not be entirely fair as the slowest methods are computing something significantly harder, but there’s still a 20x performance difference between a seemingly reasonable naive implementation and the fastest one. We find that in general the _autovec methods that use fadd_algebraic are faster and more accurate than the ones using regular floating-point addition. The reason they’re more accurate as well is the same reason a pairwise sum is more accurate: any reordering of the additions is better as the default long-chain-of-additions is already the worst case for accuracy in a sum. Limiting ourselves to Pareto-optimal choices we get the following four implementations: AlgorithmThroughputMean absolute error naive_autovec118.6 GB/s14.538 block_kahan_autovec98.0 GB/s1.2306 crate_accurate_inplace1.9 GB/s0.0015 crate_fsum1.2 GB/s0.0000 Note that implementation differences can be quite impactful, and there are likely dozens more methods of compensated summing I did not compare here. For most cases I think block_kahan_autovec wins here, having good accuracy (that doesn’t degenerate with larger inputs) at nearly the maximum speed. For most applications the extra accuracy from the correctly-rounded sums is unnecessary, and they are 50-100x slower. By splitting the loop up into an explicit remainder plus a tight loop of 256-element sums we can squeeze out a bit more performance, and avoid a couple floating-point ops for the last chunk: #![allow(internal_features)] #![feature(core_intrinsics)] use std::intrinsics::fadd_algebraic; fn sum_block(arr: &[f32]) -> f32 { arr.iter().fold(0.0, |x, y| fadd_algebraic(x, *y)) } pub fn sum_orlp(arr: &[f32]) -> f32 { let mut chunks = arr.chunks_exact(256); let mut sum = 0.0; let mut c = 0.0; for chunk in &mut chunks { let y = sum_block(chunk) - c; let t = sum + y; c = (t - sum) - y; sum = t; } sum + (sum_block(chunks.remainder()) - c) } AlgorithmThroughputMean absolute error sum_orlp112.2 GB/s1.2306 You can of course tweak the number 256, I found that using 128 was $\approx$ 20% slower, and that 512 didn’t really improve performance but did cost accuracy. Conclusion I think the fadd_algebraic and similar algebraic intrinsics are very useful for achieving high-speed floating-point routines, and that other languages should add them as well. A global -ffast-math is not good enough, as we’ve seen above the best implementation was a hybrid between automatically optimized math for speed, and manually implemented non-associative compensated operations. Finally, if you are using LLVM, beware of -ffast-math. It is undefined behavior to produce a NaN or infinity while that flag is set in LLVM. I have no idea why they chose this hardcore stance which makes virtually every program that uses it unsound. If you are targetting LLVM with your language, avoid the nnan and ninf fast-math flags.

a year ago • 55 votes

Extracting and Depositing Bits

Suppose you have a 64-bit word and wish to extract a couple bits from it. For example you just performed a SWAR algorithm and wish to extract the least significant bit of each byte in the u64. This is simple enough, you simply perform a binary AND with a mask of the bits you wish to keep: let out = word & 0x0101010101010101; However, this still leaves the bits of interest spread throughout the 64-bit word. What if we also want to compress the 8 bits we wish to extract into a single byte? Or what if we want the inverse, spreading the 8 bits of a byte among the least significant bits of each byte in a 64-bit word? PEXT and PDEP If you are using a modern x86-64 CPU, you are in luck. In the much underrated BMI instruction set there are two very powerful instructions: PDEP and PEXT. They are inverses of each other, PEXT extracts bits, PDEP deposits bits. PEXT takes in a word and a mask, takes just those bits from the word where the mask has a 1 bit, and compresses all selected bits to a contiguous output word. Simulated in Rust this would be: fn pext64(word: u64, mask: u64) -> u64 { let mut out = 0; let mut out_idx = 0; for i in 0..64 { let ith_mask_bit = (mask >> i) & 1; let ith_word_bit = (word >> i) & 1; if ith_mask_bit == 1 { out |= ith_word_bit << out_idx; out_idx += 1; } } out } For example if you had the bitstring abcdefgh and mask 10110001 you would get output bitstring 0000acdh. PDEP is exactly its inverse, it takes contiguous data bits as a word, and a mask, and deposits the data bits one-by-one (starting at the least significant bits) into those bits where the mask has a 1 bit, leaving the rest as zeros: fn pdep64(word: u64, mask: u64) -> u64 { let mut out = 0; let mut input_idx = 0; for i in 0..64 { let ith_mask_bit = (mask >> i) & 1; if ith_mask_bit == 1 { let next_word_bit = (word >> input_idx) & 1; out |= next_word_bit << i; input_idx += 1; } } out } So if you had the bitstring abcdefgh and mask 10100110 you would get output e0f00gh0 (recall that we traditionally write bitstrings with the least significant bit on the right). These instructions are incredibly powerful and flexible, and the amazing thing is that these instructions only take a single cycle on modern Intel and AMD CPUs! However, they are not available in other instruction sets, so whenever you use them you will also likely need to write a cross-platform alternative. Unfortunately, both PDEP and PEXT are very slow on AMD Zen and Zen2. They are implemented in microcode, which is really unfortunate. The platform advertises through CPUID that the instructions are supported, but they’re almost unusably slow. Use with caution. Extracting bits with multiplication While the following technique can’t replace all PEXT cases, it can be quite general. It is applicable when: The bit pattern you want to extract is static and known in advance. If you want to extract $k$ bits, there must at least be a $k-1$ gap between two bits of interest. We compute the bit extraction by adding together many left-shifted copies of our input word, such that we construct our desired bit pattern in the uppermost bits. The trick is to then realize that w << i is equivalent to w * (1 << i) and thus the sum of many left-shifted copies is equivalent to a single multiplication by (1 << i) + (1 << j) + ... I think the technique is best understood by visual example. Let’s use our example from earlier, extracting the least significant bit of each byte in a 64-bit word. We start off by masking off just those bits. After that we shift the most significant bit of interest to the topmost bit of the word to get our first shifted copy. We then repeat this, shifting the second most significant bit of interest to the second topmost bit, etc. We sum all these shifted copies. This results in the following (using underscores instead of zeros for clarity): mask = _______1_______1_______1_______1_______1_______1_______1_______1 t = w & mask t = _______a_______b_______c_______d_______e_______f_______g_______h t << 7 = a_______b_______c_______d_______e_______f_______g_______h_______ t << 14 = _b_______c_______d_______e_______f_______g_______h______________ t << 21 = __c_______d_______e_______f_______g_______h_____________________ t << 28 = ___d_______e_______f_______g_______h____________________________ t << 35 = ____e_______f_______g_______h___________________________________ t << 42 = _____f_______g_______h__________________________________________ t << 49 = ______g_______h_________________________________________________ t << 56 = _______h________________________________________________________ sum = abcdefghbcdefgh_cdefh___defgh___efgh____fgh_____gh______h_______ Note how we constructed abcdefgh in the topmost 8 bits, which we can then extract using a single right-shift by $64 - 8 = 56$ bits. Since (1 << 7) + (1 << 14) + ... + (1 << 56) = 0x102040810204080 we get the following implementation: fn extract_lsb_bit_per_byte(w: u64) -> u8 { let mask = 0x0101010101010101; let sum_of_shifts = 0x102040810204080; ((w & mask).wrapping_mul(sum_of_shifts) >> 56) as u8 } Not as good as PEXT, but three arithmetic instructions is not bad at all. Depositing bits with multiplication Unfortunately the following technique is significantly less general than the previous one. While you can take inspiration from it to implement similar algorithms, as-is it is limited to just spreading the bits of one byte to the least significant bit of each byte in a 64-bit word. The trick is similar to the one above. We add 8 shifted copies of our byte which once again translates to a multiplication. By choosing a shift that increases in multiples if 9 instead of 8 we ensure that the bit pattern shifts over by one position in each byte. We then mask out our bits of interest, and finish off with a shift and byteswap (which compiles to a single instruction bswap on Intel or rev on ARM) to put our output bits on the least significant bits and reverse the order. This technique visualized: b = ________________________________________________________abcdefgh b << 9 = _______________________________________________abcdefgh_________ b << 18 = ______________________________________abcdefgh__________________ b << 27 = _____________________________abcdefgh___________________________ b << 36 = ____________________abcdefgh____________________________________ b << 45 = ___________abcdefgh_____________________________________________ b << 54 = __abcdefgh______________________________________________________ b << 63 = h_______________________________________________________________ sum = h_abcdefgh_abcdefgh_abcdefgh_abcdefgh_abcdefgh_abcdefgh_abcdefgh mask = 1_______1_______1_______1_______1_______1_______1_______1_______ s & msk = h_______g_______f_______e_______d_______c_______b_______a_______ We once again note that the sum of shifts can be precomputed as 1 + (1 << 9) + ... + (1 << 63) = 0x8040201008040201, allowing the following implementation: fn deposit_lsb_bit_per_byte(b: u8) -> u64 { let sum_of_shifts = 0x8040201008040201; let mask = 0x8080808080808080; let spread = (b as u64).wrapping_mul(sum_of_shifts) & mask; u64::swap_bytes(spread >> 7) } This time it required 4 arithmetic instructions, not quite as good as PDEP, but again not bad compared to a naive implementation, and this is cross-platform.

a year ago • 39 votes

When Random Isn't

This post is an anecdote from over a decade ago, of which I lost the actual code. So please forgive me if I do not accurately remember all the details. Some details are also simplified so that anyone that likes computer security can enjoy this article, not just those who have played World of Warcraft (although the Venn diagram of those two groups likely has a solid overlap). When I was around 14 years old I discovered World of Warcraft developed by Blizzard Games and was immediately hooked. Not long after I discovered add-ons which allow you to modify how your game’s user interface looks and works. However, not all add-ons I downloaded did exactly what I wanted to do. I wanted more. So I went to find out how they were made. In a weird twist of fate, I blame World of Warcraft for me seriously picking up programming. It turned out that they were made in the Lua programming language. Add-ons were nothing more than a couple .lua source files in a folder directly loaded into the game. The barrier of entry was incredibly low: just edit a file, press save and reload the interface. The fact that the game loaded your source code and you could see it running was magical! I enjoyed it immensely and in no time I was only writing add-ons and was barely playing the game itself anymore. I published quite a few add-ons in the next two years, which mostly involved copying other people’s code with some refactoring / recombining / tweaking to my wishes. Add-on security A thought you might have is that it’s a really bad idea to let users have fully programmable add-ons in your game, lest you get bots. However, the system Blizzard made to prevent arbitrary programmable actions was quite clever. Naturally, it did nothing to prevent actual botting, but at least regular rule-abiding players were fundamentally restricted to the automation Blizzard allowed. Most UI elements that you could create were strictly decorative or informational. These were completely unrestricted, as were most APIs that strictly gather information. For example you can make a health bar display using two frames, a background and a foreground, sizing the foreground frame using an API call to get the health of your character. Not all API calls were available to you however. Some were protected so they could only be called from official Blizzard code. These typically involved the API calls that would move your character, cast spells, use items, etc. Generally speaking anything that actually makes you perform an in-game action was protected. The API for getting your exact world location and camera orientation also became protected at some point. This was a reaction by Blizzard to new add-ons that were actively drawing 3D elements on top of the game world to make boss fights easier. However, some UI elements needed to actually interact with the game itself, e.g. if I want to make a button that casts a certain spell. For this you could construct a special kind of button that executes code in a secure environment when clicked. You were only allowed to create/destroy/move such buttons when not in combat, so you couldn’t simply conditionally place such buttons underneath your cursor to automate actions during combat. The catch was that this secure environment did allow you to programmatically set which spell to cast, but doesn’t let you gather the information you would need to do arbitrary automation. All access to state from outside the secure environment was blocked. There were some information gathering API calls available to match the more accessible in-game macro system, but nothing as fancy as getting skill cooldowns or unit health which would enable automatic optimal spellcasting. So there were two environments: an insecure one where you can get all information but can’t act on it, and a secure one where you can act but can’t get the information needed for automation. A backdoor channel Fast forward a couple years and I had mostly stopped playing. My interests had mainly moved on to more “serious” programming, and I was only occasionally playing, mostly messing around with add-on ideas. But this secure environment kept on nagging in my brain; I wanted to break it. Of course there was third-party software that completely disables the security restrictions from Blizzard, but what’s the fun in that? I wanted to do it “legitimately”, using the technically allowed tools, as a challenge. Obviously using clever code to bypass security restrictions is no better than using third-party software, and both would likely get you banned. I never actually wanted to use the code, just to see if I could make it work. So I scanned the secure environment allowed function list to see if I could smuggle any information from the outside into the secure environment. It all seemed pretty hopeless until I saw one tiny, innocent little function: random. An evil idea came in my head: random number generators (RNGs) used in computers are almost always pseudorandom number generators with (hidden) internal state. If I can manipulate this state, perhaps I can use that to pass information into the secure environment. Random number generator woes It turned out that random was just a small shim around C’s rand. I was excited! This meant that there was a single global random state that was shared in the process. It also helps that rand implementations tended to be on the weak side. Since World of Warcraft was compiled with MSVC, the actual implementation of rand was as follows: uint32_t state; int rand() { state = state * 214013 + 2531011; return (state >> 16) & 0x7fff; } This RNG is, for the lack of a better word, shite. It is a naked linear congruential generator, and a weak one at that. Which in my case, was a good thing. I can understand MSVC keeps rand the same for backwards compatibility, and at least all documentation I could find for rand recommends you not to use rand for cryptographic purposes. But was there ever a time where such a bad PRNG implementation was fit for any purpose? So let’s get to breaking this thing. Since the state is so laughably small and you can see 15 bits of the state directly you can keep a full list of all possible states consistent with a single output of the RNG and use further calls to the RNG to eliminate possibilities until a single one remains. But we can be significantly more clever. First we note that the top bit of state never affects anything in this RNG. (state >> 16) & 0x7fff masks out 15 bits, after shifting away the bottom 16 bits, and thus effectively works mod $2^{31}$. Since on any update the new state is a linear function of the previous state, we can propagate this modular form all the way down to the initial state as $$f(x) \equiv f(x \bmod m) \mod m$$ for any linear $f$. Let $a = 214013$ and $b = 2531011$. We observe the 15-bit output $r_0, r_1$ of two RNG calls. We’ll call the 16-bit portion of the RNG state that is hidden by the shift $h_0, h_1$ respectively, for the states after the first and second call. This means the state of the RNG after the first call is $2^{16} r_0 + h_0$ and similarly for $2^{16} r_1 + h_1$ after the second call. Then we have the following identity: $$a\cdot (2^{16}r_0 + h_0) + b \equiv 2^{16}r_1 + h_1 \mod 2^{31},$$ $$ah_0 \equiv h_1 + 2^{16}(r_1 - ar_0) - b \mod 2^{31}.$$ Now let $c \geq 0$ be the known constant $(2^{16}(r_1 - ar_0) - b) \bmod 2^{31}$, then for some integer $k$ we have $$ah_0 = h_1 + c + 2^{31} k.$$ Note that the left hand side ranges from $0$ to $a (2^{16} - 1) \approx 2^{33.71}$. Thus we must have $-1 \leq k \leq 2^{2.71} < 7$. Reordering we get the following expression for $h_0$: $$h_0 = \frac{c + 2^{31} k}{a} + h_1/a.$$ Since $a > 2^{16}$ while $0 \leq h_1 < 2^{16}$ we note that the term $0 \leq h_1/a < 1$. Thus, assuming a solution exists, we must have $$h_0 = \left\lceil\frac{c + 2^{31} k}{a}\right\rceil.$$ So for $-1 \leq k < 7$ we compute the above guess for the hidden portion of the RNG state after the first call. This gives us 8 guesses, after which we can reject bad guesses using follow-up calls to the RNG until a single unique answer remains. While I was able to re-derive the above with little difficulty now, 18 year old me wasn’t as experienced in discrete math. So I asked on crypto.SE, with the excuse that I wanted to ‘show my colleagues how weak this RNG is’. It worked, which sparks all kinds of interesting ethics questions. An example implementation of this process in Python: import random A = 214013 B = 2531011 class MsvcRng: def __init__(self, state): self.state = state def __call__(self): self.state = (self.state * A + B) % 2**32 return (self.state >> 16) & 0x7fff # Create a random RNG state we'll reverse engineer. hidden_rng = MsvcRng(random.randint(0, 2**32)) # Compute guesses for hidden state from 2 observations. r0 = hidden_rng() r1 = hidden_rng() c = (2**16 * (r1 - A * r0) - B) % 2**31 ceil_div = lambda a, b: (a + b - 1) // b h_guesses = [ceil_div(c + 2**31 * k, A) for k in range(-1, 7)] # Validate guesses until a single guess remains. guess_rngs = [MsvcRng(2**16 * r0 + h0) for h0 in h_guesses] guess_rngs = [g for g in guess_rngs if g() == r1] while len(guess_rngs) > 1: r = hidden_rng() guess_rngs = [g for g in guess_rngs if g() == r] # The top bit can not be recovered as it never affects the output, # but we should have recovered the effective hidden state. assert guess_rngs[0].state % 2**31 == hidden_rng.state % 2**31 While I did write the above process with a while loop, it appears to only ever need a third output at most to narrow it down to a single guess. Putting it together Once we could reverse-engineer the internal state of the random number generator we could make arbitrary automated decisions in the supposedly secure environment. How it worked was as follows: An insecure hook was registered that would execute right before the secure environment code would run. In this hook we have full access to information, and make a decision as to which action should be taken (e.g. casting a particular spell). This action is looked up in a hardcoded list to get an index. The current state of the RNG is reverse-engineered using the above process. We predict the outcome of the next RNG call. If this (modulo the length of our action list) does not give our desired outcome, we advance the RNG and try again. This repeats until the next random number would correspond to our desired action. The hook returns, and the secure environment starts. It generates a “random” number, indexes our hardcoded list of actions, and performs the “random” action. That’s all! By being able to simulate the RNG and looking one step ahead we could use it as our information channel by choosing exactly the right moment to call random in the secure environment. Now if you wanted to support a list of $n$ actions it would on average take $n$ steps of the RNG before the correct number came up to pass along, but that wasn’t a problem in practice. Conclusion I don’t know when Blizzard fixed the issue where the RNG state is so weak and shared, or whether they were aware of it being an issue at all. A few years after I had written the code I tried it again out of curiosity, and it had stopped working. Maybe they switched to a different algorithm, or had a properly separated RNG state for the secure environment. All-in-all it was a lot of effort for a niche exploit in a video game that I didn’t even want to use. But there certainly was a magic to manipulating something supposedly random into doing exactly what you want, like a magician pulling four aces from a shuffled deck.

a year ago • 38 votes

More in programming

Building competency is better than therapy

The world is waking to the fact that talk therapy is neither the only nor the best way to cure a garden-variety petite depression. Something many people will encounter at some point in their lives. Studies have shown that exercise, for example, is a more effective treatment than talk therapy (and pharmaceuticals!) when dealing with such episodes. But I'm just as interested in the role building competence can have in warding off the demons. And partly because of this meme: I've talked about it before, but I keep coming back to the fact that it's exactly backwards. That signing up for an educational quest into Linux, history, or motorcycle repair actually is an incredibly effective alternative to therapy! At least for men who'd prefer to feel useful over being listened to, which, in my experience, is most of them. This is why I find it so misguided when people who undertake those quests sell their journey short with self-effacing jibes about how much an unattractive nerd it makes them to care about their hobby. Mihaly Csikszentmihalyi detailed back in 1990 how peak human happiness arrives exactly in these moments of flow when your competence is stretched by a difficult-but-doable challenge. Don't tell me those endorphins don't also help counter the darkness. But it's just as much about the fact that these pursuits of competence usually offer a great opportunity for community as well that seals the deal. I've found time and again that people are starved for the kind of topic-based connections that, say, learning about Linux offers in spades. You're not just learning, you're learning with others. That is a time-tested antidote to depression: Forming and cultivating meaningful human connections. Yes, doing so over the internet isn't as powerful as doing it in person, but it's still powerful. It still offers community, involvement, and plenty of invitation to carry a meaningful burden. Open source nails this trifecta of motivations to a T. There are endless paths of discovery and mastery available. There are tons of fellow travelers with whom to connect and collaborate. And you'll find an unlimited number of meaningful burdens in maintainerships open for the taking. So next time you see that meme, you should cheer that the talk therapy table is empty. Leave it available for the severe, pathological cases that exercise and the pursuit of competence can't cure. Most people just don't need therapy, they need purpose, they need competence, they need exercise, and they need community.

2 days ago • 5 votes

Programming Language Escape Hatches

The excellent-but-defunct blog Programming in the 21st Century defines "puzzle languages" as languages were part of the appeal is in figuring out how to express a program idiomatically, like a puzzle. As examples, he lists Haskell, Erlang, and J. All puzzle languages, the author says, have an "escape" out of the puzzle model that is pragmatic but stigmatized. But many mainstream languages have escape hatches, too. Languages have a lot of properties. One of these properties is the language's capabilities, roughly the set of things you can do in the language. Capability is desirable but comes into conflicts with a lot of other desirable properties, like simplicity or efficiency. In particular, reducing the capability of a language means that all remaining programs share more in common, meaning there's more assumptions the compiler and programmer can make ("tractability"). Assumptions are generally used to reason about correctness, but can also be about things like optimization: J's assumption that everything is an array leads to high-performance "special combinations". Rust is the most famous example of mainstream language that trades capability for tractability.1 Rust has a lot of rules designed to prevent common memory errors, like keeping a reference to deallocated memory or modifying memory while something else is reading it. As a consequence, there's a lot of things that cannot be done in (safe) Rust, like interface with an external C function (as it doesn't have these guarantees). To do this, you need to use unsafe Rust, which lets you do additional things forbidden by safe Rust, such as deference a raw pointer. Everybody tells you not to use unsafe unless you absolutely 100% know what you're doing, and possibly not even then. Sounds like an escape hatch to me! To extrapolate, an escape hatch is a feature (either in the language itself or a particular implementation) that deliberately breaks core assumptions about the language in order to add capabilities. This explains both Rust and most of the so-called "puzzle languages": they need escape hatches because they have very strong conceptual models of the language which leads to lots of assumptions about programs. But plenty of "kitchen sink" mainstream languages have escape hatches, too: Some compilers let C++ code embed inline assembly. Languages built on .NET or the JVM has some sort of interop with C# or Java, and many of those languages make assumptions about programs that C#/Java do not. The SQL language has stored procedures as an escape hatch and vendors create a second escape hatch of user-defined functions. Ruby lets you bypass any form of encapsulation with send. Frameworks have escape hatches, too! React has an entire page on them. (Does eval in interpreted languages count as an escape hatch? It feels different, but it does add a lot of capability. Maybe they don't "break assumptions" in the same way?) The problem with escape hatches In all languages with escape hatches, the rule is "use this as carefully and sparingly as possible", to the point where a messy solution without an escape hatch is preferable to a clean solution with one. Breaking a core assumption is a big deal! If the language is operating as if its still true, it's going to do incorrect things. I recently had this problem in a TLA+ contract. TLA+ is a language for modeling complicated systems, and assumes that the model is a self-contained universe. The client wanted to use the TLA+ to test a real system. The model checker should send commands to a test device and check the next states were the same. This is straightforward to set up with the IOExec escape hatch.2 But the model checker assumed that state exploration was pure and it could skip around the state randomly, meaning it would do things like set x = 10, then skip to set x = 1, then skip back to inc x; assert x == 11. Oops! We eventually found workarounds but it took a lot of clever tricks to pull off. I'll probably write up the technique when I'm less busy with The Book. The other problem with escape hatches is the rest of the language is designed around not having said capabilities, meaning it can't support the feature as well as a language designed for them from the start. Even if your escape hatch code is clean, it might not cleanly integrate with the rest of your code. This is why people complain about unsafe Rust so often. It should be noted though that all languages with automatic memory management are trading capability for tractability, too. If you can't deference pointers, you can't deference null pointers. ↩ From the Community Modules (which come default with the VSCode extension). ↩

3 days ago • 10 votes

How We Migrated the Parse API From Ruby to Golang (Resurrected)

I wrote a lot of blog posts over my time at Parse, but they all evaporated after Facebook killed the product. Most of them I didn’t care about (there were, ahem, a lot of status updates and “service reliability announcements”, but I was mad about losing this one in particular, a deceptively casual retrospective of […]

3 days ago • 10 votes

It's a Beelink, baby

It's only been two months since I discovered the power and joy of this new generation of mini PCs. My journey started out with a Minisforum UM870, which is a lovely machine, but since then, I've come to really appreciate the work of Beelink. In a crowded market for mini PCs, Beelink stands out with their superior build quality, their class-leading cooling and silent operation, and their use of fully Linux-compatible components (the UM870 shipped with a MediaTek bluetooth/wifi card that doesn't work with Linux!). It's the complete package at three super compelling price points. For $289, you can get the EQR5, which runs an 8-core AMD Zen3 5825U that puts out 1723/6419 in Geekbench, and comes with 16GB RAM and 500GB NVMe. I've run Omarchy on it, and it flies. For me, the main drawback was the lack of a DisplayPort, which kept me from using it with an Apple display, and the fact that the SER8 exists. But if you're on a budget, and you're fine with HDMI only, it's a wild bargain. For $499, you can get the SER8. That's the price-to-performance sweet spot in the range. It uses the excellent 8-core AMD Zen4 8745HS that puts out 2595/12985 in Geekbench (~M4 multi-core numbers!), and runs our HEY test suite with 30,000 assertions almost as fast as an M4 Max! At that price, you get 32GB RAM + 1TB NVMe, as well as a DisplayPort, so it works with both the Apple 5K Studio Display and the Apple 6K XDR Display (you just need the right cable). Main drawback is limited wifi/bluetooth range, but Beelink tells me there's a fix on the way for that. For $929, you can get the SER9 HX370. This is the top dog in this form factor. It uses the incredible 12-core AMD Zen5 HX370 that hits 2990/15611 in Geekbench, and runs our HEY test suite faster than any Apple M chip I've ever tested. The built-in graphics are also very capable. Enough to play a ton of games at 1080p. It also sorted the SER8's current wifi/bluetooth range issue. I ran the SER8 as my main computer for a while, but now I'm using the SER9, and I just about never feel like I need anything more. Yes, the Framework Desktop, with its insane AMD Max 395+ chip, is even more bonkers. It almost cuts the HEY test suite time in half(!), but it's also $1,795, and not yet generally available. (But preorders are open for the ballers!). Whichever machine fits your budget, it's frankly incredible that we have this kind of performance and efficiency available at these prices with all of these Beelinks drawing less than 10 watt at idle and no more than 100 watt at peak! So it's no wonder that Beelink has been selling these units like hotcakes since I started talking about them on X as the ideal, cheap Omarchy desktop computers. It's such a symbiotic relationship. There are a ton of programmers who have become Linux curious, and Beelink offers no-brainer options to give that a try at a bargain. I just love when that happens. The perfect intersection of hardware, software, and timing. That's what we got here. It's a Beelink, baby! (And no, before you ask, I don't get any royalties, there's no affiliate link, and I don't own any shares in Beelink. I just love discovering great technology and seeing people start their Linux journey with an awesome, affordable computer!)

4 days ago • 12 votes

How to Network as a Developer (Without Feeling Sleazy)

“One of the comments that sparked this article,” our founder Paul McMahon told me, “was someone saying, ‘I don’t really want to do networking because it seems kind of sleazy. I’m not that kind of person.’” I guess that’s the key misconception people have when they hear ‘networking.’ They think it’s like some used car salesman kind of approach where you have to go and get something out of the person. That’s a serious error, according to Paul, and it worries him that so many developers share that mindset. Instead, Paul considers networking a mix of making new friends, growing a community, and enjoying serendipitous connections that might not bear fruit until years later, but which could prove to be make-or-break career moments. It’s something that you don’t get quick results on and that doesn’t make a difference at all until it does. And it’s just because of the one connection you happen to make at an event you went to once, this rainy Tuesday night when you didn’t really feel like going, but told yourself you have to go—and that can make all the difference. As Paul has previously shared, he can attribute much of his own career success—and, interestingly enough, his peace of mind—to the huge amount of networking he’s done over the years. This is despite the fact that Paul is, in his own words, “not such a talkative person when it comes to small talk or whatever.” Recently I sat down with Paul to discuss exactly how developers are networking “wrong,” and how they can get it right instead. In our conversation, we covered: What networking really is, and why you need to start ASAP Paul’s top tip for anyone who wants to network Advice for networking as an introvert Online vs offline networking—which is more effective? And how to network in Japan, even when you don’t speak Japanese What is networking, really, and why should you start now? “Sometimes,” Paul explained, “people think of hiring fairs and various exhibitions as the way to network, but that’s not networking to me. It’s purely transactional. Job seekers are focused on getting interviews, recruiters on making hires. There’s no chance to make friends or help people outside of your defined role.” Networking is getting to know other people, understanding how maybe you can help them and how they can help you. And sometime down the road, maybe something comes out of it, maybe it doesn’t, but it’s just expanding your connections to people. One reason developers often avoid or delay networking is that, at its core, networking is a long game. Dramatic impacts on your business or career are possible—even probable—but they don’t come to fruition immediately. “A very specific example would be TokyoDev,” said Paul. “One of our initial clients that posted to the list came through networking.” Sounds like a straightforward result? It’s a bit more complicated than that. “There was a Belgian guy, Peter, whom I had known through the Ruby and tech community in Japan for a while,” Paul explained. “We knew each other, and Peter had met another Canadian guy, Jack, who [was] looking to hire a Ruby developer. “So Peter knew about me and TokyoDev and introduced me to Jack, and that was the founder of Degica, who became one of our first clients. . . . And that just happened because I had known Peter through attending events over the years.” Another example is how Paul’s connection to the Ruby community helped him launch Doorkeeper. His participation in Ruby events played a critical role in helping the product succeed, but only because he’d already volunteered at them for years. “Because I knew those people,” he said, “they wanted to support me, and I guess they also saw that I was genuine about this stuff, and I wasn’t participating in these events with some big plan about, ‘If I do this, then they’re going to use my system,’ or whatever. Again, it was people helping each other out.” These delayed and indirect impacts are why Paul thinks you should start networking right now. “You need to do it in advance of when you actually need it,” he said. “People say they’re looking for a job, and they’re told ‘You could network!’ Yeah, that could potentially help, but it’s almost too late.” You should have been networking a couple years ago when you didn’t need to be doing it, because then you’ve already built up the relationships. You can have this karma you’re building over time. . . . Networking has given me a lot of wealth. I don’t mean so much in money per se, but more it’s given me a safety net. “Now I’m confident,” he said, “that if tomorrow TokyoDev disappeared, I could easily find something just through my connections. I don’t think I’ll, at least in Japan, ever have to apply for a job again.” “I think my success with networking is something that anyone can replicate,” Paul went on, “provided they put in the time. I don’t consider myself to be especially skilled in networking, it’s just that I’ve spent over a decade making connections with people.” How to network (the non-sleazy way) Paul has a fair amount of advice for those who want to network in an effective, yet genuine fashion. His first and most important tip: Be interested in other people. Asking questions rather than delivering your own talking points is Paul’s number one method for forging connections. It also helps avoid those “used car salesman” vibes. “ That’s why, at TokyoDev,” Paul explained, “we typically bar recruiters from attending our developer events. Because there are these kinds of people who are just going around wanting to get business cards from everyone, wanting to get their contact information, wanting to then sell them on something later. It’s quite obvious that they’re like that, and that leads to a bad environment, [if] someone’s trying to sell you on something.” Networking for introverts The other reason Paul likes asking questions is that it helps him to network as an introvert. “That’s actually one of the things that makes networking easier for someone who isn’t naturally so talkative. . . . When you meet new people, there are some standard questions you can ask them, and it’s like a blank slate where you’re filling in the details about this person.” He explained further that going to events and being social can be fun for him, but he doesn’t exactly find it relaxing. “When it comes to talking about something I’m really interested in, I can do it, but I stumble in these social situations. Despite that, I think I have been pretty successful when it comes to networking.” “What has worked well for me,” he went on, “has been putting myself in those situations that require me to do some networking, like going to an event.” Even if you aren’t that proactive, you’re going to meet a couple of people there. You’re making more connections than you would if you stayed home and played video games. The more often you do it, the easier it gets, and not just because of practice: there’s a cumulative effect to making connections. “Say you’re going to an event, and maybe last time you met a couple of people, you could just say ‘Hi’ to those people again. And maybe they are talking with someone else they can introduce you to.” Or, you can be the one making the introductions. “What has also worked well for me, is that I like to introduce other people,” Paul said. It’s always a great feeling when I’m talking to someone at an event, and I hear about what they’re doing or what they’re wanting to do, and then I can introduce someone else who maybe matches that. “And it’s also good for me, then I can just be kind of passive there,” Paul joked. “I don’t have to be out there myself so much, if they’re talking to each other.” His last piece of advice for introverts is somewhat counterintuitive. “Paradoxically,” he told me, “it helps if you’re in some sort of leadership position.” If you’re an introvert, my advice would be one, just do it, but then also look for opportunities for helping in some more formal capacity, whether it’s organizing an event yourself, volunteering at an event . . . [or] making presentations. “Like for me, when I’ve organized a Tokyo Rubyist Meetup,” Paul said, “[then] naturally as the organizer there people come to talk to me and ask me questions. . . . And it’s been similar when I’ve presented at an event, because then people have something that they know that you know something about, and maybe they want to know more about it, and so then they can ask you more questions and lead the conversation that way.” Offline vs online networking When it comes to offline vs online networking, Paul prefers offline. In-person events are great for networking because they create serendipity. You meet people through events you wouldn’t meet otherwise just because you’re in the same physical space as them. Those time and space constraints add pressure to make conversation—in a good way. “It’s natural when you are meeting someone, you ask about what they’re doing, and you make that small connection there. Then, after seeing them at multiple different events, you get a bit of a stronger connection to them.” “Physical events are [also] much more constrained in the number of people, so it’s easier to help people,” he added. “Like with TokyoDev, I can’t help every single person online there, but if someone meets me at the event [and is] asking for advice or something like that, of course I’ve got to answer them. And I have more time for them there, because we’re in the same place at the same time.” As humans, we’re more likely to help other people we have met in person, I think just because that’s how our brains work. That being said, Paul’s also found success with online networking. For example, several TokyoDev contributors—myself included—started working with Paul after interacting with him online. I commented on TokyoDev’s Dungeons and Dragons article, which led to Paul checking my profile and asking to chat about my experience. Scott, our community moderator and editor, joined TokyoDev in a paid position after being active on the TokyoDev Discord. Michelle was also active on the Discord, and Paul initially asked her to write an article for TokyoDev on being a woman software engineer in Japan, before later bringing her onto the team. Key to these results was that they involved no stereotypical “networking” strategies on either side: we all connected simply by playing a role in a shared, online community. Consistent interactions with others, particularly over a longer period of time, builds mutual trust and understanding. Your online presence can help with offline networking. As TokyoDev became bigger and more people knew about me through my blog, it became a lot easier to network with people at events because they’re like, ‘Hey, you’re Paul from TokyoDev. I like that site.’ “It just leads to more opportunities,” he continued. “If you’ve interacted with someone before online, and then you meet them offline, you already do have a bit of a relationship with them, so you’re more likely to have a place to start the conversation. [And] if you’re someone who is struggling with doing in-person networking, the more you can produce or put out there [online], the more opportunities that can lead to.” Networking in Japanese While there are a number of events throughout Japan that are primarily in English, for best networking results, developers should take advantage of Japanese events as well—even if your Japanese isn’t that good. In 2010, Paul created the Tokyo Rubyist Meetup, with the intention of bringing together Japanese and international developers. To ensure it succeeded, he knew he needed more connections to the Japanese development community. “So I started attending a lot of Japanese developer events where I was the only non-Japanese person there,” said Paul. “I didn’t have such great Japanese skills. I couldn’t understand all the presentations. But it still gave me a chance to make lots of connections, both with people who would later present at [Tokyo Rubyist Meetup], but also with other Japanese developers whom I would work with either on my own products or also on other client projects.” I think it helped being kind of a visible minority. People were curious about me, about why I was attending these events. Their curiosity not only helped him network, but also gave him a helping hand when it came to Japanese conversation. “It’s a lot easier for me in Japanese to be asked questions and answer them,” he admitted. But Paul wasn’t just attending those seminars and events in a passive manner. He soon started delivering presentations himself, usually as part of Lightning Talks—again, despite his relatively low level of Japanese. “It doesn’t matter if you do a bad job of it,” he said. Japanese people I think are really receptive to people trying to speak in Japanese and making an effort. I think they’re happy to have someone who isn’t Japanese present, even if they don’t do a great job. He also quickly learned that the most important networking doesn’t take place at the meetup itself. “At least in the past,” he explained, “it was really split . . . [there’s the] seminar time where everyone goes and watches someone present. Everyone’s pretty passive there and there isn’t much conversation going on between attendees. “Then afterwards—and maybe less than half of the people attend—but they go to a restaurant and have drinks after the event. And that’s where all the real socialization happens, and so that’s where I was able to really make the most connections.” That said, Paul noted that the actual “drinking” part of the process has noticeably diminished. “Drinking culture in Japan is changing a lot,” he told me. “I noticed that even when hosting the Tokyo Rubyist Meetup. When we were first hosting it, we [had] an average of 2.5 beers per participant. And more recently, the average is one or less per participant there. “I think there is not so much of an expectation for people to drink a lot. Young Japanese people don’t drink at the same rate, so don’t feel like you actually have to get drunk at these events. You probably shouldn’t,” he added with a laugh. What you should do is be persistent, and patient. It took Paul about a year of very regularly attending events before he felt he was treated as a member of the community. “Literally I was attending more than the typical Japanese person,” he said. “At the peak, there were a couple events per week.” His hard work paid off, though. “I think one thing about Japanese culture,” he said, “is that it’s really group based.” Initially, as foreigners, we see ourselves in the foreign group versus the Japanese group, and there’s kind of a barrier there. But if you can find some other connection, like in my case Ruby, then with these developers I became part of the “Ruby developer group,” and then I felt much more accepted. Eventually he experienced another benefit. “I think it was after a year of volunteering, maybe two years. . . . RubyKaigi, the biggest Ruby conference in Japan and one of the biggest developer conferences in Japan [in general], used Doorkeeper, the event registration system [I created], to manage their event. “That was a big win for us because it showed that we were a serious system to lots of people there. It exposed us to lots of potential users and was one of the things that I think led to us, for a time, being the most popular event registration system among the tech community in Japan.” Based on his experiences, Paul would urge more developers to try attending Japanese dev events. “Because I think a lot of non-Japanese people are still too intimidated to go to these events, even if they have better Japanese ability than I did. “If you look at most of the Japanese developer events happening now, I think the participants are almost exclusively Japanese, but still, that doesn’t need to be the case.” Takeaways What Paul hopes other developers will take away from this article is that networking shouldn’t feel sleazy. Instead, good networking looks like: Being interested in other people. Asking them questions is the easiest way to start a conversation and make a genuine connection. Occasionally just making yourself go to that in-person event. Serendipity can’t happen if you don’t create opportunities for it. Introducing people to each other—it’s a fast and stress-free way to make more connections. Volunteering for events or organizing your own. Supporting offline events with a solid online presence as well. Not being afraid to attend Japanese events, even if your Japanese isn’t good. Above all, Paul stressed, don’t overcomplicate what networking is at its core. Really what networking comes down to is learning about what other people are doing, and how you can help them or how they can help you. Whether you’re online, offline, or doing it in Japanese, that mindset can turn networking from an awkward, sleazy-feeling experience into something you actually enjoy—even on a rainy Tuesday night.

4 days ago • 11 votes

New here?