Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
31
Last year I wrote about inlining just the fast path of Lemire’s algorithm for nearly-divisionless unbiased bounded random numbers. The idea was to reduce code bloat by eliminating lots of copies of the random number generator in the rarely-executed slow paths. However a simple split prevented the compiler from being able to optimize cases like pcg32_rand(1 << n), so a lot of the blog post was toying around with ways to mitigate this problem. On Monday while procrastinating a different blog post, I realised that it’s possible to do better: there’s a more general optimization which gives us the 1 << n special case for free. nearly divisionless Lemire’s algorithm has about 4 neat tricks: use multiplication instead of division to reduce the output of a random number generator modulo some limit eliminate the bias in (1) by (counterintuitively) looking at the lower digits fun modular arithmetic to calculate the reject threshold for (2) arrange the reject tests to avoid the slow division in...
2 months ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Tony Finch's blog

the penultimate conditional syntax

About half a year ago I encountered a paper bombastically titled “the ultimate conditional syntax”. It has the attractive goal of unifying pattern match with boolean if tests, and its solution is in some ways very nice. But it seems over-complicated to me, especially for something that’s a basic work-horse of programming. I couldn’t immediately see how to cut it down to manageable proportions, but recently I had an idea. I’ll outline it under the “penultimate conditionals” heading below, after reviewing the UCS and explaining my motivation. what the UCS? whence UCS out of scope penultimate conditionals dangling syntax examples antepenultimate breath what the UCS? The ultimate conditional syntax does several things which are somewhat intertwined and support each other. An “expression is pattern” operator allows you to do pattern matching inside boolean expressions. Like “match” but unlike most other expressions, “is” binds variables whose scope is the rest of the boolean expression that might be evaluated when the “is” is true, and the consequent “then” clause. You can “split” tests to avoid repeating parts that are the same in successive branches. For example, if num < 0 then -1 else if num > 0 then +1 else 0 can be written if num < 0 then -1 > 0 then +1 else 0 The example shows a split before an operator, where the left hand operand is the same and the rest of the expression varies. You can split after the operator when the operator is the same, which is common for “is” pattern match clauses. Indentation-based syntax (an offside rule) reduces the amount of punctuation that splits would otherwise need. An explicit version of the example above is if { x { { < { 0 then −1 } }; { > { 0 then +1 } }; else 0 } } (This example is written in the paper on one line. I’ve split it for narrow screens, which exposes what I think is a mistake in the nesting.) You can also intersperse let bindings between splits. I doubt the value of this feature, since “is” can also bind values, but interspersed let does have its uses. The paper has an example using let to avoid rightward drift: if let tp1_n = normalize(tp1) tp1_n is Bot then Bot let tp2_n = normalize(tp2) tp2_n is Bot then Bot let m = merge(tp1_n, tp2_n) m is Some(tp) then tp m is None then glb(tp1_n, tp2_n) It’s probably better to use early return to avoid rightward drift. The desugaring uses let bindings when lowering the UCS to simpler constructions. whence UCS Pattern matching in the tradition of functional programming languages supports nested patterns that are compiled in a way that eliminates redundant tests. For example, this example checks that e1 is Some(_) once, not twice as written. if e1 is Some(Left(lv)) then e2 Some(Right(rv)) then e3 None then e4 Being cheeky, I’d say UCS introduces more causes of redundant checks, then goes to great effort to to eliminate redundant checks again. Splits reduce redundant code at the source level; the bulk of the paper is about eliminating redundant checks in the lowering from source to core language. I think the primary cause of this extra complexity is treating the is operator as a two-way test rather than a multi-way match. Splits are introduced as a more general (more complicated) way to build multi-way conditions out of two-way tests. There’s a secondary cause: the tradition of expression-oriented functional languages doesn’t like early returns. A nice pattern in imperative code is to write a function as a series of preliminary calculations and guards with early returns that set things up for the main work of the function. Rust’s ? operator and let-else statement support this pattern directly. UCS addresses the same pattern by wedging calculate-check sequences into if statements, as in the normalize example above. out of scope I suspect UCS’s indentation-based syntax will make programmers more likely to make mistakes, and make compilers have more trouble producing nice error messages. (YAML has put me off syntax that doesn’t have enough redundancy to support good error recovery.) So I wondered if there’s a way to have something like an “is pattern” operator in a Rust-like language, without an offside rule, and without the excess of punctuation in the UCS desugaring. But I couldn’t work out how to make the scope of variable bindings in patterns cover all the code that might need to use them. The scope needs to extend into the consequent then clause, but also into any follow-up tests – and those tests can branch so the scope might need to reach into multiple then clauses. The problem was the way I was still thinking of the then and else clauses as part of the outer if. That implied the expression has to be closed off before the then, which troublesomely closes off the scope of any is-bound variables. The solution – part of it, at least – is actually in the paper, where then and else are nested inside the conditional expression. penultimate conditionals There are two ingredients: The then and else clauses become operators that cause early return from a conditional expression. They can be lowered to a vaguely Rust syntax with the following desugaring rules. The 'if label denotes the closest-enclosing if; you can’t use then or else inside the expr of a then or else unless there’s another intervening if. then expr ⟼ && break 'if expr else expr ⟼ || break 'if expr else expr ⟼ || _ && break 'if expr There are two desugarings for else depending on whether it appears in an expression or a pattern. If you prefer a less wordy syntax, you might spell then as => (like match in Rust) and else as || =>. (For symmetry we might allow && => for then as well.) An is operator for multi-way pattern-matching that binds variables whose scope covers the consequent part of the expression. The basic form is like the UCS, scrutinee is pattern which matches the scrutinee against the pattern returning a boolean result. For example, foo is None Guarded patterns are like, scrutinee is pattern && consequent where the scope of the variables bound by the pattern covers the consequent. The consequent might be a simple boolean guard, for example, foo is Some(n) && n < 0 or inside an if expression it might end with a then clause, if foo is Some(n) && n < 0 => -1 // ... Simple multi-way patterns are like, scrutinee is { pattern || pattern || … } If there is a consequent then the patterns must all bind the same set of variables (if any) with the same types. More typically, a multi-way match will have consequent clauses, like scrutinee is { pattern && consequent || pattern && consequent || => otherwise } When a consequent is false, we go on to try other alternatives of the match, like we would when the first operand of boolean || is false. To help with layout, you can include a redundant || before the first alternative. For example, if foo is { || Some(n) && n < 0 => -1 || Some(n) && n > 0 => +1 || Some(n) => 0 || None => 0 } Alternatively, if foo is { Some(n) && ( n < 0 => -1 || n > 0 => +1 || => 0 ) || None => 0 } (They should compile the same way.) The evaluation model is like familiar shortcutting && and || and the syntax is supposed to reinforce that intuition. The UCS paper spends a lot of time discussing backtracking and how to eliminate it, but penultimate conditionals evaluate straightforwardly from left to right. The paper briefly mentions as patterns, like Some(Pair(x, y) as p) which in Rust would be written Some(p @ Pair(x, y)) The is operator doesn’t need a separate syntax for this feature: Some(p is Pair(x, y)) For large examples, the penultimate conditional syntax is about as noisy as Rust’s match, but it scales down nicely to smaller matches. However, there are differences in how consequences and alternatives are punctuated which need a bit more discussion. dangling syntax The precedence and associativity of the is operator is tricky: it has two kinds of dangling-else problem. The first kind occurs with a surrounding boolean expression. For example, when b = false, what is the value of this? b is true || false It could bracket to the left, yielding false: (b is true) || false Or to the right, yielding true: b is { true || false } This could be disambiguated by using different spellings for boolean or and pattern alternatives. But that doesn’t help for the second kind which occurs with an inner match. foo is Some(_) && bar is Some(_) || None Does that check foo is Some(_) with an always-true look at bar ( foo is Some(_) ) && bar is { Some(_) || None } Or does it check bar is Some(_) and waste time with foo? foo is { Some(_) && ( bar is Some(_) ) || None } I have chosen to resolve the ambiguity by requiring curly braces {} around groups of alternative patterns. This allows me to use the same spelling || for all kinds of alternation. (Compare Rust, which uses || for boolean expressions, | in a pattern, and , between the arms of a match.) Curlies around multi-way matches can be nested, so the example in the previous section can also be written, if foo is { || Some(n) && n < 0 => -1 || Some(n) && n > 0 => +1 || { Some(0) || None } => 0 } The is operator binds tigher than && on its left, but looser than && on its right (so that a chain of && is gathered into a consequent) and tigher than || on its right so that outer || alternatives don’t need extra brackets. examples I’m going to finish these notes by going through the ultimate conditional syntax paper to translate most of its examples into the penultimate syntax, to give it some exercise. Here we use is to name a value n, as a replacement for the |> abs pipe operator, and we use range patterns instead of split relational operators: if foo(args) is { || 0 => "null" || n && abs(n) is { || 101.. => "large" || ..10 => "small" || => "medium" ) } In both the previous example and the next one, we have some extra brackets where UCS relies purely on an offside rule. if x is { || Right(None) => defaultValue || Right(Some(cached)) => f(cached) || Left(input) && compute(input) is { || None => defaultValue || Some(result) => f(result) } } This one is almost identical to UCS apart from the spellings of and, then, else. if name.startsWith("_") && name.tailOption is Some(namePostfix) && namePostfix.toIntOption is Some(index) && 0 <= index && index < arity && => Right([index, name]) || => Left("invalid identifier: " + name) Here are some nested multi-way matches with overlapping patterns and bound values: if e is { // ... || Lit(value) && Map.find_opt(value) is Some(result) => Some(result) // ... || { Lit(value) || Add(Lit(0), value) || Add(value, Lit(0)) } => { print_int(value); Some(value) } // ... } The next few examples show UCS splits without the is operator. In my syntax I need to press a few more buttons but I think that’s OK. if x == 0 => "zero" || x == 1 => "unit" || => "?" if x == 0 => "null" || x > 0 => "positive" || => "negative" if predicate(0, 1) => "A" || predicate(2, 3) => "B" || => "C" The first two can be written with is instead, but it’s not briefer: if x is { || 0 => "zero" || 1 => "unit" || => "?" } if x is { || 0 => "null" || 1.. => "positive" || => "negative" } There’s little need for a split-anything feature when we have multi-way matches. if foo(u, v, w) is { || Some(x) && x is { || Left(_) => "left-defined" || Right(_) => "right-defined" } || None => "undefined" } A more complete function: fn zip_with(f, xs, ys) { if [xs, ys] is { || [x :: xs, y :: ys] && zip_with(f, xs, ys) is Some(tail) => Some(f(x, y) :: tail) || [Nil, Nil] => Some(Nil) || => None } } Another fragment of the expression evaluator: if e is { // ... || Var(name) && Map.find_opt(env, name) is { || Some(Right(value)) => Some(value) || Some(Left(thunk)) => Some(thunk()) } || App(lhs, rhs) => // ... // ... } This expression is used in the paper to show how a UCS split is desugared: if Pair(x, y) is { || Pair(Some(xv), Some(yv)) => xv + yv || Pair(Some(xv), None) => xv || Pair(None, Some(yv)) => yv || Pair(None, None) => 0 } The desugaring in the paper introduces a lot of redundant tests. I would desugar straightforwardly, then rely on later optimizations to eliminate other redundancies such as the construction and immediate destruction of the pair: if Pair(x, y) is Pair(xx, yy) && xx is { || Some(xv) && yy is { || Some(yv) => xv + yv || None => xv } || None && yy is { || Some(yv) => yv || None => 0 } } Skipping ahead to the “non-trivial example” in the paper’s fig. 11: if e is { || Var(x) && context.get(x) is { || Some(IntVal(v)) => Left(v) || Some(BoolVal(v)) => Right(v) } || Lit(IntVal(v)) => Left(v) || Lit(BoolVal(v)) => Right(v) // ... } The next example in the paper compares C# relational patterns. Rust’s range patterns do a similar job, with the caveat that Rust’s ranges don’t have a syntax for exclusive lower bounds. fn classify(value) { if value is { || .. -4.0 => "too low" || 10.0 .. => "too high" || NaN => "unknown" || => "acceptable" } } I tend to think relational patterns are the better syntax than ranges. With relational patterns I can rewrite an earlier example like, if foo is { || Some(< 0) => -1 || Some(> 0) => +1 || { Some(0) || None } => 0 } I think with the UCS I would have to name the Some(_) value to be able to compare it, which suggests that relational patterns can be better than UCS split relational operators. Prefix-unary relational operators are also a nice way to write single-ended ranges in expressions. We could simply write both ends to get a complete range, like >= lo < hi or like if value is > -4.0 < 10.0 => "acceptable" || => "far out" Near the start I quoted a normalize example that illustrates left-aligned UCS expression. The penultimate version drifts right like the Scala version: if normalize(tp1) is { || Bot => Bot || tp1_n && normalize(tp2) is { || Bot => Bot || tp2_n && merge(tp1_n, tp2_n) is { || Some(tp) => tp || None => glb(tp1_n, tp2_n) } } } But a more Rusty style shows the benefits of early returns (especially the terse ? operator) and monadic combinators. let tp1 = normalize(tp1)?; let tp2 = normalize(tp2)?; merge(tp1, tp2) .unwrap_or_else(|| glb(tp1, tp2)) antepenultimate breath When I started writing these notes, my penultimate conditional syntax was little more than a sketch of an idea. Having gone through the previous section’s exercise, I think it has turned out better than I thought it might. The extra nesting from multi-way match braces doesn’t seem to be unbearably heavyweight. However, none of the examples have bulky then or else blocks which are where the extra nesting is more likely to be annoying. But then, as I said before it’s comparable to a Rust match: match scrutinee { pattern => { consequent } } if scrutinee is { || pattern => { consequent } } The || lines down the left margin are noisy, but hard to get rid of in the context of a curly-brace language. I can’t reduce them to | like OCaml because what would I use for bitwise OR? I don’t want presence or absence of flow control to depend on types or context. I kind of like Prolog / Erlang , for && and ; for ||, but that’s well outside what’s legible to mainstream programmers. So, dunno. Anyway, I think I’ve successfully found a syntax that does most of what UCS does, but much in a much simpler fashion.

4 days ago 6 votes
testing data structures per element

Recently, Alex Kladov wrote on the TigerBeetle blog about swarm testing data structures. It’s a neat post about randomized testing with Zig. I wrote a comment with an idea that was new to Alex @matklad, so I’m reposing a longer version here. differential testing problems grow / shrink random elements element-wise testing test loop data structure size invariants performance conclusion differential testing A common approach to testing data structures is to write a second reference implementation that has the same API but simpler and/or more obviously correct, though it uses more memory or is slower or less concurrent or otherwise not up to production quality. Then, run the production implementation and the reference implementation on the same sequence of operations, and verify that they produce the same results. Any difference is either a bug in the production implementation (probably) or a bug in the reference implementation (unlucky) or a bug in the tests (unfortunate). This is a straightforward differential testing pattern. problems There are a couple of difficulties with this kind of basic differential testing. grow / shrink The TigerBeetle article talks about adjusting the probabilities of different operations on the data structure to try to explore more edge cases. To motivate the idea, the article talks about adjusting the probabilities of adding or deleting items: If adding and deleting have equal probability, then the test finds it hard to grow the data structure to interesting sizes that might expose bugs. Unfortunately, if the probability of add is greater than del, then the data structure tends to grow without bound. If the probability of del is greater than add, then it tries to shrink from nothing: worse than equal probabilities! They could preload the data structure to test how it behaves when it shrinks, but a fixed set of probabilities per run is not good at testing both growth and shrinkage on the same test run on the same data structure. One way to improve this kind of test is to adjust the probability of add and del dynamically: make add more likely when the data structure is small, and del more likely when it is big. And maybe make add more likely in the first half of a test run and del more likely in the second half. random elements The TigerBeetle article glosses over the question of where the tests get fresh elements to add to the data structure. And its example is chosen so it doesn’t have to think about which elements get deleted. In my experience writing data structures for non-garbage-collected languages, I had to be more deliberate about how to create and destroy elements. That led to a style of test that’s more element-centric, as Alex described it. element-wise testing Change the emphasis so that instead of testing that two implementations match, test that one implementation obeys the expected behaviour. No need to make a drop-in replacement reference implementation! What I typically do is pre-allocate an array of elements, with slots that I can set to keep track of how each element relates to the data structure under test. The most important property is whether the element has been added or deleted, but there might be others related to ordering of elements, or values associated with keys, and so on. test loop Each time round the loop, choose at random an element from the array, and an action such as add / del / get / … Then, if it makes sense, perform the operation on the data structure with the element. For example, you might skip an add action if the element is already in the data structure, unless you can try to add it and expect an error. data structure size This strategy tends to grow the data structure until about 50% of the pre-allocated elements are inserted, then it makes a random walk around this 50% point. Random walks can diverge widely from their central point both in theory and in practice, so this kind of testing is reasonably effective at both growing and (to a lesser extent) shrinking the data structure. invariants I usually check some preconditions before an action, to verify that the data structure matches the expected properties of the chosen element. This can help to detect earlier that an action on one element has corrupted another element. After performing the action and updating the element’s properties, I check the updated properties as a postcondition, to make sure the action had the expected effects. performance John Regehr’s great tutorial, how to fuzz an ADT implementation, recommends writing a checkRep() function that thoroughly verifies a data structure’s internal consistency. A checkRep() function is a solid gold testing tool, but it is O(n) at least and typically very slow. If you call checkRep() frequently during testing, your tests slow down dramatically as your data structure gets larger. I like my per-element invariants to be local and ideally O(1) or O(log n) at worst, so they don’t slow down the tests too much. conclusion Recently I’ve used this pattern to exhibit concurrency bugs in an API that’s hard to make thread-safe. Writing the tests has required some cunning to work out what invariants I can usefully maintain and test; what variety of actions I can use to stress those invariants; and what mix of elements + actions I need so that my tests know which properties of each element should be upheld and which can change. I’m testing multiple implementations of the same API, trying to demonstrate which is safest. Differential testing can tell me that implementations diverge, but not which is correct, whereas testing properties and invariants more directly tells me whether an implementation does what I expect. (Or gives me a useless answer when my tests are weak.) Which is to say that this kind of testing is a fun creative challenge. I find it a lot more rewarding than example-based testing.

2 weeks ago 3 votes
syntax highlighting with tree-sitter

I have added syntax highlighting to my blog using tree-sitter. Here are some notes about what I learned, with some complaining. static site generator markdown ingestion highlighting incompatible?! highlight names class names styling code results future work frontmatter templates feed style highlight quality static site generator I moved my blog to my own web site a few years ago. It is produced using a scruffy Rust program that converts a bunch of Markdown files to HTML using pulldown-cmark, and produces complete pages from Handlebars templates. Why did I write another static site generator? Well, partly as an exercise when learning Rust. Partly, since I wrote my own page templates, I’m not going to benefit from a library of existing templates. On the contrary, it’s harder to create new templates that work with a general-purpose SSG than write my own simpler site-specific SSG. It’s miserable to write programs in template languages. My SSG can keep the logic in the templates to a minimum, and do all the fiddly stuff in Rust. (Which is not very fiddly, because my site doesn’t have complicated navigation – compared to the multilevel menus on www.dns.cam.ac.uk for instance.) markdown ingestion There are a few things to do to each Markdown file: split off and deserialize the YAML frontmatter find the <cut> or <toc> marker that indicates the end of the teaser / where the table of contents should be inserted augment headings with self-linking anchors (which are also used by the ToC) Before this work I was using regexes to do all these jobs, because that allowed me to treat pulldown-cmark as a black box: Markdown in, HTML out. But for syntax highlighting I had to be able to find fenced code blocks. It was time to put some code into the pipeline between pulldown-cmark’s parser and renderer. And if I’m using a proper parser I can get rid of a few regexes: after some hacking, now only the YAML frontmatter is handled with a regex. Sub-heading linkification and ToC construction are fiddly and more complicated than they were before. But they are also less buggy: markup in headings actually works now! Compared to the ToC, it’s fairly simple to detect code blocks and pass them through a highlighter. You can look at my Markdown munger here. (I am not very happy with the way it uses state, but it works.) highlighting As well as the tree-sitter-highlight documentation I used femark as an example implementation. I encountered a few problems. incompatible?! I could not get the latest tree-sitter-highlight to work as described in its documentation. I thought the current tree-sitter crates were incompatible with each other! For a while I downgraded to an earlier version, but eventually I solved the problem. Where the docs say, let javascript_language = tree_sitter_javascript::language(); They should say: let javascript_language = tree_sitter::Language::new( tree_sitter_javascript::LANGUAGE ); highlight names I was offended that tree-sitter-highlight seems to expect me to hardcode a list of highlight names, without explaining where they come from or what they mean. I was doubly offended that there’s an array of STANDARD_CAPTURE_NAMES but it isn’t exported, and doesn’t match the list in the docs. You mean I have to copy and paste it? Which one?! There’s some discussion of highlight names in the tree-sitter manual’s “syntax highlighting” chapter, but that is aimed at people who are writing a tree-sitter grammar, not people who are using one. Eventually I worked out that tree_sitter_javascript::HIGHLIGHT_QUERY in the tree-sitter-highlight example corresponds to the contents of a highlights.scm file. Each @name in highlights.scm is a highlight name that I might be interested in. In principle I guess different tree-sitter grammars should use similar highlight names in their highlights.scm files? (Only to a limited extent, it turns out.) I decided the obviously correct list of highlight names is the list of every name defined in the HIGHLIGHT_QUERY. The query is just a string so I can throw a regex at it and build an array of the matches. This should make the highlighter produce <span> wrappers for as many tokens as possible in my code, which might be more than necessary but I don’t have to style them all. class names The tree-sitter-highlight crate comes with a lightly-documented HtmlRenderer, which does much of the job fairly straightforwardly. The fun part is the attribute_callback. When the HtmlRenderer is wrapping a token, it emits the start of a <span then expects the callback to append whatever HTML attributes it thinks might be appropriate. Uh, I guess I want a class="..." here? Well, the highlight names work a little bit like class names: they have dot-separated parts which tree-sitter-highlight can match more or less specifically. (However I am telling it to match all of them.) So I decided to turn each dot-separated highlight name into a space-separated class attribute. The nice thing about this is that my Rust code doesn’t need to know anything about a language’s tree-sitter grammar or its highlight query. The grammar’s highlight names become CSS class names automatically. styling code Now I can write some simple CSS to add some colours to my code. I can make type names green, code span.hilite.type { color: #aca; } If I decide builtin types should be cyan like keywords I can write, code span.hilite.type.builtin, code span.hilite.keyword { color: #9cc; } results You can look at my tree-sitter-highlight wrapper here. Getting it to work required a bit more creativity than I would have preferred, but it turned out OK. I can add support for a new language by adding a crate to Cargo.toml and a couple of lines to hilite.rs – and maybe some CSS if I have not yet covered its highlight names. (Like I just did to highlight the CSS above!) future work While writing this blog post I found myself complaining about things that I really ought to fix instead. frontmatter I might simplify the per-page source format knob so that I can use pulldown-cmark’s support for YAML frontmatter instead of a separate regex pass. This change will be easier if I can treat the html pages as Markdown without mangling them too much (is Markdown even supposed to be idempotent?). More tricky are a couple of special case pages whose source is Handlebars instead of Markdown. templates I’m not entirely happy with Handlebars. It’s a more powerful language than I need – I chose Handlebars instead of Mustache because Handlebars works neatly with serde. But it has a dynamic type system that makes the templates more error-prone than I would like. Perhaps I can find a more static Rust template system that takes advantage of the close coupling between my templates and the data structure that describes the web site. However, I like my templates to be primarily HTML with a sprinkling of insertions, not something weird that’s neither HTML nor Rust. feed style There’s no CSS in my Atom feed, so code blocks there will remain unstyled. I don’t know if feed readers accept <style> tags or if it has to be inline styles. (That would make a mess of my neat setup!) highlight quality I’m not entirely satisfied with the level of detail and consistency provided by the tree-sitter language grammars and highlight queries. For instance, in the CSS above the class names and property names have the same colour because the CSS highlights.scm gives them the same highlight name. The C grammar is good at identifying variables, but the Rust grammar is not. Oh well, I guess it’s good enough for now. At least it doesn’t involve Javascript.

a month ago 22 votes
random numbers from pcg32 at 200 Gbit/s

One of the neat things about the PCG random number generator by Melissa O’Neill is its use of instruction-level parallelism: the PCG state update can run in parallel with its output permutation. However, PCG only has a limited amount of ILP, about 3 instructions. Its overall speed is limited by the rate at which a CPU can run a sequence where the output of one multiply-add feeds into the next multiply-add. … Or is it? With some linear algebra and some AVX512, I can generate random numbers from a single instance of pcg32 at 200 Gbit/s on a single core. This is the same sequence of random numbers generated in the same order as normal pcg32, but more than 4x faster. You can look at the benchmark in my pcg-dxsm repository. skip ahead the insight multipliers trying it out results skip ahead One of the slightly weird features that PCG gets from its underlying linear congruential generator is “seekability”: you can skip ahead k steps in the stream of random numbers in log(k) time. The PCG paper (in section 4.3.1) cites Forrest Brown’s paper, random numbers with arbitrary strides, which explains that the skip-ahead feature is useful for reproducibility of monte carlo simulations. But what caught my eye is the skip-ahead formula. Rephrased in programmer style, state[n+k] = state[n] * pow(MUL, k) + inc * (pow(MUL, k) - 1) / (MUL - 1) the insight The skip-ahead formula says that we can calculate a future state using a couple of multiplications. The skip-ahead multipliers depend only on the LCG multiplier, not on the variable state, nor on the configurable increment. That means that for a fixed skip ahead, we can precalculate the multipliers before compile time. The skip-ahead formula allows us to unroll the PCG data dependency chain. Normally, four iterations of the PCG state update look like, state0 = rng->state state1 = state0 * MUL + rng->inc state2 = state1 * MUL + rng->inc state3 = state2 * MUL + rng->inc state4 = state3 * MUL + rng->inc rng->state = state4 With the skip-ahead multipliers it looks like, state0 = rng->state state1 = state0 * MULs1 + rng->inc * MULi1 state2 = state0 * MULs2 + rng->inc * MULi2 state3 = state0 * MULs3 + rng->inc * MULi3 state4 = state0 * MULs4 + rng->inc * MULi4 rng->state = state4 These state calculations can be done in parallel using NEON or AVX vector instructions. The disadvantage is that calculating future states in parallel requires more multiplications than doing so in series, but that’s OK because modern CPUs have lots of ALUs. multipliers The skip-ahead formula is useful for jumping ahead long distances, because (as Forrest Brown explained) you can do the exponentiation in log(k) time using repeated squaring. (The same technique is used in for modexp in RSA.) But I’m only interested in the first few skip-ahead multipliers. I’ll define the linear congruential generator as: lcg(s, inc) = s * MUL + inc Which is used in PCG’s normal state update like: rng->state = lcg(rng->state, rng->inc) To precalculate the first few skip-ahead multipliers, we iterate the LCG starting from zero and one, like this: MULs0 = 1 MULs1 = lcg(MULs0, 0) MULs2 = lcg(MULs1, 0) MULi0 = 0 MULi1 = lcg(MULi0, 1) MULi2 = lcg(MULi1, 1) My benchmark code’s commentary includes a proof by induction, which I wrote to convince myself that these multipliers are correct. trying it out To explore how well this skip-ahead idea works, I have written a couple of variants of my pcg32_bytes() function, which simply iterates pcg32 and writes the results to a byte array. The variants have an adjustable amount of parallelism. One variant is written as scalar code in a loop that has been unrolled by hand a few times. I wanted to see if standard C gets a decent speedup, perhaps from autovectorization. The other variant uses the GNU C portable vector extensions to calculate pcg32 in an explicitly parallel manner. The benchmark also ensures the output from every variant matches the baseline pcg32_bytes(). results The output from the benchmark harness lists: the function variant either the baseline version or uN for a scalar loop unrolled N times or xN for vector code with N lanes its speed in bytes per nanosecond (aka gigabytes per second) its performance relative to the baseline There are small differences in style between the baseline and u1 functions, but their performance ought to be basically the same. Apple clang 16, Macbook Pro M1 Pro. This compiler is eager and fairly effective at autovectorizing. ARM NEON isn’t big enough to get a speedup from 8 lanes of parallelism. __ 3.66 bytes/ns x 1.00 u1 3.90 bytes/ns x 1.07 u2 6.40 bytes/ns x 1.75 u3 7.66 bytes/ns x 2.09 u4 8.52 bytes/ns x 2.33 x2 7.59 bytes/ns x 2.08 x4 10.49 bytes/ns x 2.87 x8 10.40 bytes/ns x 2.84 The following results were from my AMD Ryzen 9 7950X running Debian 12 “bookworm”, comparing gcc vs clang, and AVX2 vs AVX512. gcc is less keen to autovectorize so it doesn’t do very well with the unrolled loops. (Dunno why u1 is so much slower than the baseline.) gcc 12.2 -march=x86-64-v3 __ 5.57 bytes/ns x 1.00 u1 5.13 bytes/ns x 0.92 u2 5.03 bytes/ns x 0.90 u3 7.01 bytes/ns x 1.26 u4 6.83 bytes/ns x 1.23 x2 3.96 bytes/ns x 0.71 x4 8.00 bytes/ns x 1.44 x8 12.35 bytes/ns x 2.22 clang 16.0 -march=x86-64-v3 __ 4.89 bytes/ns x 1.00 u1 4.08 bytes/ns x 0.83 u2 8.76 bytes/ns x 1.79 u3 10.43 bytes/ns x 2.13 u4 10.81 bytes/ns x 2.21 x2 6.67 bytes/ns x 1.36 x4 12.67 bytes/ns x 2.59 x8 15.27 bytes/ns x 3.12 gcc 12.2 -march=x86-64-v4 __ 5.53 bytes/ns x 1.00 u1 5.53 bytes/ns x 1.00 u2 5.55 bytes/ns x 1.00 u3 6.99 bytes/ns x 1.26 u4 6.79 bytes/ns x 1.23 x2 4.75 bytes/ns x 0.86 x4 17.14 bytes/ns x 3.10 x8 20.90 bytes/ns x 3.78 clang 16.0 -march=x86-64-v4 __ 5.53 bytes/ns x 1.00 u1 4.25 bytes/ns x 0.77 u2 7.94 bytes/ns x 1.44 u3 9.31 bytes/ns x 1.68 u4 15.33 bytes/ns x 2.77 x2 9.07 bytes/ns x 1.64 x4 21.74 bytes/ns x 3.93 x8 26.34 bytes/ns x 4.76 That last result is pcg32 generating random numbers at 200 Gbit/s.

3 months ago 29 votes

More in programming

Building (and Breaking) Your First X86 Assembly Program

We build a minimal X86 assembly program, run it… and hit a crash. But that crash is exactly what makes this program worth writing.

13 hours ago 4 votes
RubyKaigi 2025 Recap

In 2023 I attended RubyKaigi for the first time and also wrote my first recap, which I’m pleased to say was well-received! This was my third time attending RubyKaigi, and I was once again really impressed with the event. I’m eternally grateful to the conference organizers, local organizers (organizers recruited each year who live/lived in the area RubyKaigi is held), designers, NOC team, helpers, sponsors, speakers, and other attendees for helping create such a memorable experience, not just during the three day conference, but over my entire time in Matsuyama. What is RubyKaigi? RubyKaigi is a three-day technology conference focused on the Ruby programming language, held annually “somewhere in Japan.” It attracts a global audience and this year welcomed over 1500 attendees to Matsuyama, Ehime. The traveling nature of the conference means that for the majority of the attendees (not just the international ones), it’s a chance to take a trip—and the days leading up to and following the event are full of fun encounters with other Rubyists as we wander around town. Checking social media a few days before the conference, I saw posts tagged with “RubyKaigi Day –3” and started getting FOMO! Talks RubyKaigi featured 3 keynotes, 51 talks, 11 Lightning talks, the TRICK showcase, and Ruby Committers and the World. There were talks in the Main Hall, Sub Hall, and Pearls Room, so you frequently had 3 options to choose from at any given time. Despite being held in Japan, RubyKaigi is an international conference that welcomes English speakers; all talks in the Sub Hall are in English, for example, and all the Japanese talks also have real-time translation and subtitles. Organizers put a great deal of thought into crafting the schedule to maximize everyone’s chances of seeing the talks they’re interested in. For example, every time slot has at least one English and one Japanese talk, and colleagues are scheduled to speak at different times so their work friends don’t have to split their support. The power of pre-study One great feature of RubyKaigi is its esoteric talks, delivered by speakers who are enthusiastic experts in their domains. I’m more of a Ruby user than a Ruby committer (the core team who have merge access to the Ruby repository), so every year there are talks during which I understand nothing—and I know I’m not alone in that. One of the topics I struggle with is parsers, so before the conference I created these sketch notes covering “How Do Computers Understand Ruby?”. Then, as I was listening to previously incomprehensible talks I found myself thinking, “I know this concept! I can understand! Wow, that’s cool!” Sketch notes on "How do Computers Understand Ruby" My plan for next year is to organize my schedule as soon as RubyKaigi’s talks are announced, and create a pre-conference study plan based on the talks I’m going to see. Just when I thought I’d leveled up, I attended Ryo Kajiwara’s talk “You Can Save Lives with End-to-End Encryption in Ruby,” where he talked about the importance of end-to-end encryption and told us all to stop using SMTP. It was a humbling experience because, after the first few slides, I couldn’t understand anything. Ruby taught me about encoding under the hood This year’s opening keynote was delivered by Mari Imaizumi, who took us on the journey of how converting the information you want to convey into symbols has been a thing since basically forever, as well as how she originally got interested in encoding. She talked about the competing standards for character encoding, and her experience with Mojibake. It made me think about how lucky I am, that the internet heavily favours English speakers. Even when I was exploring the Web in the 2000s, it was rare for me to come across content scrambled by encoding. TRICK 2025: Episode I There was a point at which it seemed like the awards were going to be a one-man-show, as Pen-san took the fifth, fourth, and third places, but the first-place winner was Don Yang, who until then hadn’t received any awards. The moment that stood out for me, though, was when Pen-san was talking about his work that won “Best ASMR”: code in the shape of bubbles that produces the sound of ocean waves when run. Pen-san explained how the sound was made and said something like, “Once you know this, anyone can write this code.” To his right you could see Matz waving his arm like, “No, no!” which captured my own feelings perfectly. Drawing of Pen san and Matz ZJIT: building a next-generation Ruby JIT Maxime Chevalier-Boisvert started her talk by apologising for YJIT not being fast enough. Because YJIT is hitting a plateau, she is now working on ZJIT. While YJIT uses a technique called Lazy Basic Block Versioning, ZJIT is method-based JIT. It will be able to “see” more chunks of code and thus be able to optimize more than YJIT. Ruby committers and the world There were humorous moments, like when the panel was asked, “What do you want to depreciate in Ruby 4.0?” Matz answered, “I don’t want to depreciate anything. If stuff breaks people will complain to me!” Also, when the question was, “If you didn’t have to think about compatibility, what would you change?” the committers started debating the compatibility of a suggested change, leading the moderator to say committers are always thinking about compatibility. Matz ended this segment with the comment that it might seem like there’s a big gap between those on stage and those in the audience, but it’s not that big—it’s something that you can definitely cross. Sketch notes for Ruby Committers and The World Eliminating unnecessary implicit allocations Despite this being an unfamiliar topic for me, Jeremy Evans explained things so well even I could follow along. I really liked how approachable this talk was! Jeremy shared about how he’d made a bug fix that resulted in Ruby allocating an object where it hadn’t allocated one before. Turns out, even when you’re known for fixing bugs, you can still cause bugs. And as he fixed this case, more cases were added to the code through new commits. To prevent new code changes from adding unnecessary allocations, he wrote the “Allocation Test Suite,” which was included in the Ruby 3.4 release. Optimizing JRuby 10 JRuby 10 is Ruby 3.4 compatible! What stood out to me the most was the slide showing a long list of CRuby committers, and then three committers on the JRuby side: Charles Nutter (the speaker), his friend, and his son. This was one of those talks where the pre-study really helped—I could better understand just what sort of work goes into JRuby. Itandi’s sponsor lightning talk Usually sponsors use their time to talk about their company, but the speaker for Itandi spent most of his time introducing his favorite manga, Shoujiki Fudousan. He encouraged us to come visit the Itandi booth, where they had set up a game in which you could win a copy of the manga. Sponsors This year there were a total of 102 sponsors, with so many gold and platinum-level sponsors the organizers held a lottery for the booths. To encourage attendees to visit each booth, there was once again a stamp rally with spaces for all 46 booths, although you could reach the pin-badge goal with just 35 stamps. It also helped keep track of where you had/hadn’t been. Since sponsors are an invaluable part of the conference, and they put so much effort into their booths, I always want to be able to show my appreciation and hear what each of them have to say. With 46 to visit, though, it was somewhat difficult! Each booth had plenty of novelties to hand out and also fun activities, such as lotteries, games, surveys and quizzes, and coding challenges. By day three, though, the warm weather had me apologetically skipping all coding challenges and quizzes, as my brain had melted. For me, the most memorable novelty was SmartHR’s acrylic charm collection! Since they missed out on a booth this year, they instead created 24 different acrylic charms. To collect them, you had to talk to people wearing SmartHR hoodies. I felt that was a unique solution and a great icebreaker. Collection of SmartHR acrylic charms I actually sent out a plea on X (Twitter) because I was missing just a few charms—and some SmartHR employees gave me charms from their own collection so I could complete the set! (Although it turns out I’m still missing the table-flipping charm, so if anyone wants to help out here . . . ) Hallway track (Events) Every year RubyKaigi has various official events scheduled before and after the conference. It’s not just one official after party—this year there were lunches, meetups, drinkups, board games, karaoke, live acts until three a.m., morning group exercises (there’s photographic proof of people running), and even an 18-hour ferry ride. I need sleep to understand the talks, and I need to wake up early because the conference is starting, and I need to stay up late to connect with other attendees! The official events I attended this year were SmartHR’s pre-event study session, the Women and Non-binary Dinner, the RubyKaigi Official Party, the STORES CAFE for Women, the Leaner Board Game Night, RubyKaraoke, RubyMusicMixin 2025 and the codeTakt Drinkup. I got to chat with so many people, including: Udzura-san, who inspired my Ruby study notes; Naoko-san, one of STORES’s founders; and Elle, a fellow Australian who traveled to Japan for RubyKaigi! The venues were also amazing. The official party was in a huge park next to Matsuyama Castle, and the board game event took place in what seemed to be a wedding reception hall. Compared to the conference, where you are usually racing to visit booths or heading to your next talk, it’s at these events you can really get to know your fellow Rubyists. But it’s not just about the official events; my time was also full of random, valuable catch ups and meetings. For lunch, I went out to eat tai meshi (sea bream rice) with some of the ladies I met at the dinner. I was staying at emorihouse, so following the after party we continued drinking and chatting in our rooms. After RubyMusicMixin, I didn’t want the night to end and bumped into a crew headed towards the river. And on day four, the cafe I wanted to go to was full, but I got in by joining Eririn-san who was already seated. That night I ended up singing karaoke with a couple of international speakers after running into them near Dogo Onsen earlier that day. Part of the joy of RubyKaigi is the impromptu events, the ones that you join because the town is so full of Rubyists you can’t help but run into them. I organised an event! This year I organised the Day 0 Women and Non-binary Dinner&DrinkUp. We were sponsored by TokyoDev, and we had a 100 percent turnout! I’m grateful to everyone who came, especially Emori-san who helped me with taking orders and on-the-day Japanese communications. With this event, I wanted to create a space where women and non-binary people–whether from Japan or overseas, RubyKaigi veterans or first-timers—could connect with each other before the conference started, while enjoying Matsuyama’s local specialities. We’re still a minority among developers, so it was exciting for me to see so many of us together in one place! Group photo from Women & Non-binary Dinner and DrinkUp! I’d love to host a similar event next year, so if you or your company is interested in sponsoring, please reach out! Matz-yama (Matsuyama) Last year’s RubyKaigi closed with the announcement that “We’ve taken Matz to the ocean [Okinawa], so now it’s time to take him to the mountains.” (Yama means “mountain” in Japanese.) Matsuyama city is located in Ehime, Shikoku, and its famous tourist attractions include Matsuyama Castle and Dogo Onsen, which is said to have inspired the bathhouse in Spirited Away. RubyKaigi banner on display at Okaido Shipping Street Ehime is renowned for mikan (蜜柑, mandarin oranges), and everywhere you go there is mikan and Mikyan, Ehime’s adorable mascot character. Before arriving, everyone told me, “In Ehime, mikan juice comes out of the tap,” which I thought meant there were literally pipes with mikan juice flowing through them all over Ehime! However, reality is not so exciting: yes, mikan juice does flow from taps, but there’s clearly a container behind the tap! There’s no underground mikan juice pipe network. 😢 RubyKaigi also highlighted Ehime’s specialties. This year’s theme color was red-orange, break-time snacks were mikan and mikan jelly, the logo paid homage to the cut fruit, and one of the sponsors even had a mikan juice tap at their booth! Also, included among this year’s official novelties was a RubyKaigi imabari towel, since Imabari city in Ehime is world famous for their towels. I’m an absolute fan of how RubyKaigi highlights the local region and encourages attendees to explore the area. Not only does this introduce international attendees to parts of Japan they might otherwise not visit, it’s a chance for local attendees to play tourist as well. Community In Matz’s closing keynote, Programming Language for AI Age, he touched on how it’s odd to delegate the fun tasks to an AI. After all, if AI does all the fun things and we do all the hard things, where’s the joy for us in that? To me, creating software is a collaborative activity—through collaboration we produce better software. Writing code is fun! Being able to connect with others is fun! Talking to new people is fun! I’ve met so many amazing people through the Ruby community, and RubyKaigi has played an important role in that. Through the community I’ve gotten advice, learned new things, and shared resources. My sketch-notes have been appreciated by others, and as I walk around there are #rubyfriends everywhere who all make me feel so welcomed. RubyKaigi attracts a variety of attendees: developers and non-developers, Ruby experts and Ruby beginners. It’s a fun conference with a wonderful community, and even though it’s a technical conference, non-technical people can enjoy participating as well. Community growth comes with its own issues, but I think attracting newcomers is an important aspect of RubyKaigi. As someone who first came as a developer completely new to Ruby, every year I learn more and am inspired to give back to the Ruby community. I hope that RubyKaigi continues to inspire participants to love Ruby, and encourages them to understand and improve it. By growing the Ruby community, we ensure Ruby will continue on as a Programming Language for the AI Age. Looking forward to Hakodate next year, and to seeing you all there! PS: Surprise, Detective Conan? I really love the Detective Conan series. This year RubyKaigi Day Three and the 2025 Detective Conan movie premiere were on the same day . . . so as soon as Matsuda-san said, “It’s over!” I ran out of the hall to go watch the movie at Cinema Sunshine Kinuyama. And next year’s RubyKaigi location, Hakodate, was the setting for the 2024 Detective Conan movie. What a deep connection RubyKaigi and Detective Conan have! Detective Conan decorations set up at the cinema in Kinuyama

20 hours ago 2 votes
Logo:

Cyrillic version of Internet Explorer logo. Because it’s iconic.

yesterday 1 votes
Talking to Espressif’s Bootloader

In my article about Espressif’s Automatic Reset, I briefly showed UART output from the bootloader, but did not go in more details. In this article, I want to go just a bit further, by showing some two-way interactions. We’ll use the initial basic “real” UART setup. Note that I did not connect DTR/RTS to RST/IO0. … Continue reading Talking to Espressif’s Bootloader → The post Talking to Espressif’s Bootloader appeared first on Quentin Santos.

2 days ago 3 votes
Beware the Complexity Merchants

When smart people get their high from building complex systems to solve simple problems, you're not going to have a good time

2 days ago 2 votes