Full Width [alt+shift+f] FOCUS MODE Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
80
Recently the Spritely Institute published an introduction to Petnames, A humane approach to secure, decentralized naming. I have long been a fan of petnames, and graph naming systems in general. I first learned about them in the context of Mark Miller’s E programming language which was a distributed object capability system on the JVM. I gather Spritely are working on something similar, but with a more webby lisp/scheme flavour – and in fact Mark Miller is the second author on this article. An interesting thing that the article sort of hints at is that these kinds of systems have a fairly smooth range of operating points from decentralized petnames all the way to centralized globally unique names. Zooko’s triangle is actually more like a fan when the “human friendly” point is fixed: there’s an arc describing the tradeoffs, from personal through local to global. As a first step away from petnames, I like to use the term “nickname” for nearby instances of what they call “edge names” in...
9 months ago

Comments

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Tony Finch's blog

first-class merges and cover letters

Although it looks really good, I have not yet tried the Jujutsu (jj) version control system, mainly because it’s not yet clearly superior to Magit. But I have been following jj discussions with great interest. One of the things that jj has not yet tackled is how to do better than git refs / branches / tags. As I underestand it, jj currently has something like Mercurial bookmarks, which are more like raw git ref plumbing than a high-level porcelain feature. In particular, jj lacks signed or annotated tags, and it doesn’t have branch names that always automatically refer to the tip. This is clearly a temporary state of affairs because jj is still incomplete and under development and these gaps are going to be filled. But the discussions have led me to think about how git’s branches are unsatisfactory, and what could be done to improve them. branch merge rebase squash fork cover letters previous branch workflow questions branch One of the huge improvements in git compared to Subversion was git’s support for merges. Subversion proudly advertised its support for lightweight branches, but a branch is not very useful if you can’t merge it: an un-mergeable branch is not a tool you can use to help with work-in-progress development. The point of this anecdote is to illustrate that rather than trying to make branches better, we should try to make merges better and branches will get better as a consequence. Let’s consider a few common workflows and how git makes them all unsatisfactory in various ways. Skip to cover letters and previous branch below where I eventually get to the point. merge A basic merge workflow is, create a feature branch hack, hack, review, hack, approve merge back to the trunk The main problem is when it comes to the merge, there may be conflicts due to concurrent work on the trunk. Git encourages you to resolve conflicts while creating the merge commit, which tends to bypass the normal review process. Git also gives you an ugly useless canned commit message for merges, that hides what you did to resolve the conflicts. If the feature branch is a linear record of the work then it can be cluttered with commits to address comments from reviewers and to fix mistakes. Some people like an accurate record of the history, but others prefer the repository to contain clean logical changes that will make sense in years to come, keeping the clutter in the code review system. rebase A rebase-oriented workflow deals with the problems of the merge workflow but introduces new problems. Primarily, rebasing is intended to produce a tidy logical commit history. And when a feature branch is rebased onto the trunk before it is merged, a simple fast-forward check makes it trivial to verify that the merge will be clean (whether it uses separate merge commit or directly fast-forwards the trunk). However, it’s hard to compare the state of the feature branch before and after the rebase. The current and previous tips of the branch (amongst other clutter) are recorded in the reflog of the person who did the rebase, but they can’t share their reflog. A force-push erases the previous branch from the server. Git forges sometimes make it possible to compare a branch before and after a rebase, but it’s usually very inconvenient, which makes it hard to see if review comments have been addressed. And a reviewer can’t fetch past versions of the branch from the server to review them locally. You can mitigate these problems by adding commits in --autosquash format, and delay rebasing until just before merge. However that reintroduces the problem of merge conflicts: if the autosquash doesn’t apply cleanly the branch should have another round of review to make sure the conflicts were resolved OK. squash When the trunk consists of a sequence of merge commits, the --first-parent log is very uninformative. A common way to make the history of the trunk more informative, and deal with the problems of cluttered feature branches and poor rebase support, is to squash the feature branch into a single commit on the trunk instead of mergeing. This encourages merge requests to be roughly the size of one commit, which is arguably a good thing. However, it can be uncomfortably confining for larger features, or cause extra busy-work co-ordinating changes across multiple merge requests. And squashed feature branches have the same merge conflict problem as rebase --autosquash. fork Feature branches can’t always be short-lived. In the past I have maintained local hacks that were used in production but were not (not yet?) suitable to submit upstream. I have tried keeping a stack of these local patches on a git branch that gets rebased onto each upstream release. With this setup the problem of reviewing successive versions of a merge request becomes the bigger problem of keeping track of how the stack of patches evolved over longer periods of time. cover letters Cover letters are common in the email patch workflow that predates git, and they are supported by git format-patch. Github and other forges have a webby version of the cover letter: the message that starts off a pull request or merge request. In git, cover letters are second-class citizens: they aren’t stored in the repository. But many of the problems I outlined above have neat solutions if cover letters become first-class citizens, with a Jujutsu twist. A first-class cover letter starts off as a prototype for a merge request, and becomes the eventual merge commit. Instead of unhelpful auto-generated merge commits, you get helpful and informative messages. No extra work is needed since we’re already writing cover letters. Good merge commit messages make good --first-parent logs. The cover letter subject line works as a branch name. No more need to invent filename-compatible branch names! Jujutsu doesn’t make you name branches, giving them random names instead. It shows the subject line of the topmost commit as a reminder of what the branch is for. If there’s an explicit cover letter the subject line will be a better summary of the branch as a whole. I often find the last commit on a branch is some post-feature cleanup, and that kind of commit has a subject line that is never a good summary of its feature branch. As a prototype for the merge commit, the cover letter can contain the resolution of all the merge conflicts in a way that can be shared and reviewed. In Jujutsu, where conflicts are first class, the cover letter commit can contain unresolved conflicts: you don’t have to clean them up when creating the merge, you can leave that job until later. If you can share a prototype of your merge commit, then it becomes possible for your collaborators to review any merge conflicts and how you resolved them. To distinguish a cover letter from a merge commit object, a cover letter object has a “target” header which is a special kind of parent header. A cover letter also has a normal parent commit header that refers to earlier commits in the feature branch. The target is what will become the first parent of the eventual merge commit. previous branch The other ingredient is to add a “previous branch” header, another special kind of parent commit header. The previous branch header refers to an older version of the cover letter and, transitively, an older version of the whole feature branch. Typically the previous branch header will match the last shared version of the branch, i.e. the commit hash of the server’s copy of the feature branch. The previous branch header isn’t changed during normal work on the feature branch. As the branch is revised and rebased, the commit hash of the cover letter will change fairly frequently. These changes are recorded in git’s reflog or jj’s oplog, but not in the “previous branch” chain. You can use the previous branch chain to examine diffs between versions of the feature branch as a whole. If commits have Gerrit-style or jj-style change-IDs then it’s fairly easy to find and compare previous versions of an individual commit. The previous branch header supports interdiff code review, or allows you to retain past iterations of a patch series. workflow Here are some sketchy notes on how these features might work in practice. One way to use cover letters is jj-style, where it’s convenient to edit commits that aren’t at the tip of a branch, and easy to reshuffle commits so that a branch has a deliberate narrative. When you create a new feature branch, it starts off as an empty cover letter with both target and parent pointing at the same commit. Alternatively, you might start a branch ad hoc, and later cap it with a cover letter. If this is a small change and rebase + fast-forward is allowed, you can edit the “cover letter” to contain the whole change. Otherwise, you can hack on the branch any which way. Shuffle the commits that should be part of the merge request so that they occur before the cover letter, and edit the cover letter to summarize the preceding commits. When you first push the branch, there’s (still) no need to give it a name: the server can see that this is (probably) going to be a new merge request because the top commit has a target branch and its change-ID doesn’t match an existing merge request. Also when you push, your client automatically creates a new instance of your cover letter, adding a “previous branch” header to indicate that the old version was shared. The commits on the branch that were pushed are now immutable; rebases and edits affect the new version of the branch. During review there will typically be multiple iterations of the branch to address feedback. The chain of previous branch headers allows reviewers to see how commits were changed to address feedback, interdiff style. The branch can be merged when the target header matches the current trunk and there are no conflicts left to resolve. When the time comes to merge the branch, there are several options: For a merge workflow, the cover letter is used to make a new commit on the trunk, changing the target header into the first parent commit, and dropping the previous branch header. Or, if you like to preserve more history, the previous branch chain can be retained. Or you can drop the cover letter and fast foward the branch on to the trunk. Or you can squash the branch on to the trunk, using the cover letter as the commit message. questions This is a fairly rough idea: I’m sure that some of the details won’t work in practice without a lot of careful work on compatibility and deployability. Do the new commit headers (“target” and “previous branch”) need to be headers? What are the compatibility issues with adding new headers that refer to other commits? How would a server handle a push of an unnamed branch? How could someone else pull a copy of it? How feasible is it to use cover letter subject lines instead of branch names? The previous branch header is doing a similar job to a remote tracking branch. Is there an opportunity to simplify how we keep a local cache of the server state? Despite all that, I think something along these lines could make branches / reviews / reworks / merges less awkward. How you merge should me a matter of your project’s preferred style, without interference from technical limitations that force you to trade off one annoyance against another. There remains a non-technical limitation: I have assumed that contributors are comfortable enough with version control to use a history-editing workflow effectively. I’ve lost all perspective on how hard this is for a newbie to learn; I expect (or hope?) jj makes it much easier than git rebase.

7 hours ago 3 votes
strongly typed?

What does it mean when someone writes that a programming language is “strongly typed”? I’ve known for many years that “strongly typed” is a poorly-defined term. Recently I was prompted on Lobsters to explain why it’s hard to understand what someone means when they use the phrase. I came up with more than five meanings! how strong? The various meanings of “strongly typed” are not clearly yes-or-no. Some developers like to argue that these kinds of integrity checks must be completely perfect or else they are entirely worthless. Charitably (it took me a while to think of a polite way to phrase this), that betrays a lack of engineering maturity. Software engineers, like any engineers, have to create working systems from imperfect materials. To do so, we must understand what guarantees we can rely on, where our mistakes can be caught early, where we need to establish processes to catch mistakes, how we can control the consequences of our mistakes, and how to remediate when somethng breaks because of a mistake that wasn’t caught. strong how? So, what are the ways that a programming language can be strongly or weakly typed? In what ways are real programming languages “mid”? Statically typed as opposed to dynamically typed? Many languages have a mixture of the two, such as run time polymorphism in OO languages (e.g. Java), or gradual type systems for dynamic languages (e.g. TypeScript). Sound static type system? It’s common for static type systems to be deliberately unsound, such as covariant subtyping in arrays or functions (Java, again). Gradual type systems migh have gaping holes for usability reasons (TypeScript, again). And some type systems might be unsound due to bugs. (There are a few of these in Rust.) Unsoundness isn’t a disaster, if a programmer won’t cause it without being aware of the risk. For example: in Lean you can write “sorry” as a kind of “to do” annotation that deliberately breaks soundness; and Idris 2 has type-in-type so it accepts Girard’s paradox. Type safe at run time? Most languages have facilities for deliberately bypassing type safety, with an “unsafe” library module or “unsafe” language features, or things that are harder to spot. It can be more or less difficult to break type safety in ways that the programmer or language designer did not intend. JavaScript and Lua are very safe, treating type safety failures as security vulnerabilities. Java and Rust have controlled unsafety. In C everything is unsafe. Fewer weird implicit coercions? There isn’t a total order here: for instance, C has implicit bool/int coercions, Rust does not; Rust has implicit deref, C does not. There’s a huge range in how much coercions are a convenience or a source of bugs. For example, the PHP and JavaScript == operators are made entirely of WAT, but at least you can use === instead. How fancy is the type system? To what degree can you model properties of your program as types? Is it convenient to parse, not validate? Is the Curry-Howard correspondance something you can put into practice? Or is it only capable of describing the physical layout of data? There are probably other meanings, e.g. I have seen “strongly typed” used to mean that runtime representations are abstract (you can’t see the underlying bytes); or in the past it sometimes meant a language with a heavy type annotation burden (as a mischaracterization of static type checking). how to type So, when you write (with your keyboard) the phrase “strongly typed”, delete it, and come up with a more precise description of what you really mean. The desiderata above are partly overlapping, sometimes partly orthogonal. Some of them you might care about, some of them not. But please try to communicate where you draw the line and how fuzzy your line is.

2 weeks ago 20 votes
p-fast trie, but smaller

Previously, I wrote some sketchy ideas for what I call a p-fast trie, which is basically a wide fan-out variant of an x-fast trie. It allows you to find the longest matching prefix or nearest predecessor or successor of a query string in a set of names in O(log k) time, where k is the key length. My initial sketch was more complicated and greedy for space than necessary, so here’s a simplified revision. (“p” now stands for prefix.) layout A p-fast trie stores a lexicographically ordered set of names. A name is a sequence of characters from some small-ish character set. For example, DNS names can be represented as a set of about 50 letters, digits, punctuation and escape characters, usually one per byte of name. Names that are arbitrary bit strings can be split into chunks of 6 bits to make a set of 64 characters. Every unique prefix of every name is added to a hash table. An entry in the hash table contains: A shared reference to the closest name lexicographically greater than or equal to the prefix. Multiple hash table entries will refer to the same name. A reference to a name might instead be a reference to a leaf object containing the name. The length of the prefix. To save space, each prefix is not stored separately, but implied by the combination of the closest name and prefix length. A bitmap with one bit per possible character, corresponding to the next character after this prefix. For every other prefix that matches this prefix and is one character longer than this prefix, a bit is set in the bitmap corresponding to the last character of the longer prefix. search The basic algorithm is a longest-prefix match. Look up the query string in the hash table. If there’s a match, great, done. Otherwise proceed by binary chop on the length of the query string. If the prefix isn’t in the hash table, reduce the prefix length and search again. (If the empty prefix isn’t in the hash table then there are no names to find.) If the prefix is in the hash table, check the next character of the query string in the bitmap. If its bit is set, increase the prefix length and search again. Otherwise, this prefix is the answer. predecessor Instead of putting leaf objects in a linked list, we can use a more complicated search algorithm to find names lexicographically closest to the query string. It’s tricky because a longest-prefix match can land in the wrong branch of the implicit trie. Here’s an outline of a predecessor search; successor requires more thought. During the binary chop, when we find a prefix in the hash table, compare the complete query string against the complete name that the hash table entry refers to (the closest name greater than or equal to the common prefix). If the name is greater than the query string we’re in the wrong branch of the trie, so reduce the length of the prefix and search again. Otherwise search the set bits in the bitmap for one corresponding to the greatest character less than the query string’s next character; if there is one remember it and the prefix length. This will be the top of the sub-trie containing the predecessor, unless we find a longer match. If the next character’s bit is set in the bitmap, continue searching with a longer prefix, else stop. When the binary chop has finished, we need to walk down the predecessor sub-trie to find its greatest leaf. This must be done one character at a time – there’s no shortcut. thoughts In my previous note I wondered how the number of search steps in a p-fast trie compares to a qp-trie. I have some old numbers measuring the average depth of binary, 4-bit, 5-bit, 6-bit and 4-bit, 5-bit, dns qp-trie variants. A DNS-trie varies between 7 and 15 deep on average, depending on the data set. The number of steps for a search matches the depth for exact-match lookups, and is up to twice the depth for predecessor searches. A p-fast trie is at most 9 hash table probes for DNS names, and unlikely to be more than 7. I didn’t record the average length of names in my benchmark data sets, but I guess they would be 8–32 characters, meaning 3–5 probes. Which is far fewer than a qp-trie, though I suspect a hash table probe takes more time than chasing a qp-trie pointer. (But this kind of guesstimate is notoriously likely to be wrong!) However, a predecessor search might need 30 probes to walk down the p-fast trie, which I think suggests a linked list of leaf objects is a better option.

a month ago 24 votes
p-fast trie: lexically ordered hash map

Here’s a sketch of an idea that might or might not be a good idea. Dunno if it’s similar to something already described in the literature – if you know of something, please let me know via the links in the footer! The gist is to throw away the tree and interior pointers from a qp-trie. Instead, the p-fast trie is stored using a hash map organized into stratified levels, where each level corresponds to a prefix of the key. Exact-match lookups are normal O(1) hash map lookups. Predecessor / successor searches use binary chop on the length of the key. Where a qp-trie search is O(k), where k is the length of the key, a p-fast trie search is O(log k). This smaller O(log k) bound is why I call it a “p-fast trie” by analogy with the x-fast trie, which has O(log log N) query time. (The “p” is for popcount.) I’m not sure if this asymptotic improvement is likely to be effective in practice; see my thoughts towards the end of this note. layout A p-fast trie consists of: Leaf objects, each of which has a name. Each leaf object refers to its successor forming a circular linked list. (The last leaf refers to the first.) Multiple interior nodes refer to each leaf object. A hash map containing every (strict) prefix of every name in the trie. Each prefix maps to a unique interior node. Names are treated as bit strings split into chunks of (say) 6 bits, and prefixes are whole numbers of chunks. An interior node contains a (1<<6) == 64 wide bitmap with a bit set for each chunk where prefix+chunk matches a key. Following the bitmap is a popcount-compressed array of references to the leaf objects that are the closest predecessor of the corresponding prefix+chunk key. Prefixes are strictly shorter than names so that we can avoid having to represent non-values after the end of a name, and so that it’s OK if one name is a prefix of another. The size of chunks and bitmaps might change; 6 is a guess that I expect will work OK. For restricted alphabets you can use something like my DNS trie name preparation trick to squash 8-bit chunks into sub-64-wide bitmaps. In Rust where cross-references are problematic, there might have to be a hash map that owns the leaf objects, so that the p-fast trie can refer to them by name. Or use a pool allocator and refer to leaf objects by numerical index. search To search, start by splitting the query string at its end into prefix + final chunk of bits. Look up the prefix in the hash map and check the chunk’s bit in the bitmap. If it’s set, you can return the corresponding leaf object because it’s either an exact match or the nearest predecessor. If it isn’t found, and you want the predecessor or successor, continue with a binary chop on the length of the query string. Look up the chopped prefix in the hash map. The next chunk is the chunk of bits in the query string immediately after the prefix. If the prefix is present and the next chunk’s bit is set, remember the chunk’s leaf pointer, make the prefix longer, and try again. If the prefix is present and the next chunk’s bit is not set and there’s a lesser bit that is set, return the leaf pointer for the lesser bit. Otherwise make the prefix shorter and try again. If the prefix isn’t present, make the prefix shorter and try again. When the binary chop bottoms out, return the longest-matching leaf you remembered. The leaf’s key and successor bracket the query string. modify When inserting a name, all its prefixes must be added to the hash map from longest to shortest. At the point where it finds that the prefix already exists, the insertion routine needs to walk down the (implicit) tree of successor keys, updating pointers that refer to the new leaf’s predecessor so they refer to the new leaf instead. Similarly, when deleting a name, remove every prefix from longest to shortest from the hash map where they only refer to this leaf. At the point where the prefix has sibling nodes, walk down the (implicit) tree of successor keys, updating pointers that refer to the deleted leaf so they refer to its predecessor instead. I can’t “just” use a concurrent hash map and expect these algorithms to be thread-safe, because they require multiple changes to the hashmaps. I wonder if the search routine can detect when the hash map is modified underneath it and retry. thoughts It isn’t obvious how a p-fast trie might compare to a qp-trie in practice. A p-fast trie will use a lot more memory than a qp-trie because it requires far more interior nodes. They need to exist so that the random-access binary chop knows whether to shorten or lengthen the prefix. To avoid wasting space the hash map keys should refer to names in leaf objects, instead of making lots of copies. This is probably tricky to get right. In a qp-trie the costly part of the lookup is less than O(k) because non-branching interior nodes are omitted. How does that compare to a p-fast trie’s O(log k)? Exact matches in a p-fast trie are just a hash map lookup. If they are worth optimizing then a qp-trie could also be augmented with a hash map. Many steps of a qp-trie search are checking short prefixes of the key near the root of the tree, which should be well cached. By contrast, a p-fast trie search will typically skip short prefixes and instead bounce around longer prefixes, which suggests its cache behaviour won’t be so friendly. A qp-trie predecessor/successor search requires two traversals, one to find the common prefix of the key and another to find the prefix’s predecessor/successor. A p-fast trie requires only one.

a month ago 26 votes
clamp / median / range

Here are a few tangentially-related ideas vaguely near the theme of comparison operators. comparison style clamp style clamp is median clamp in range range style style clash? comparison style Some languages such as BCPL, Icon, Python have chained comparison operators, like if min <= x <= max: ... In languages without chained comparison, I like to write comparisons as if they were chained, like, if min <= x && x <= max { // ... } A rule of thumb is to prefer less than (or equal) operators and avoid greater than. In a sequence of comparisons, order values from (expected) least to greatest. clamp style The clamp() function ensures a value is between some min and max, def clamp(min, x, max): if x < min: return min if max < x: return max return x I like to order its arguments matching the expected order of the values, following my rule of thumb for comparisons. (I used that flavour of clamp() in my article about GCRA.) But I seem to be unusual in this preference, based on a few examples I have seen recently. clamp is median Last month, Fabian Giesen pointed out a way to resolve this difference of opinion: A function that returns the median of three values is equivalent to a clamp() function that doesn’t care about the order of its arguments. This version is written so that it returns NaN if any of its arguments is NaN. (When an argument is NaN, both of its comparisons will be false.) fn med3(a: f64, b: f64, c: f64) -> f64 { match (a <= b, b <= c, c <= a) { (false, false, false) => f64::NAN, (false, false, true) => b, // a > b > c (false, true, false) => a, // c > a > b (false, true, true) => c, // b <= c <= a (true, false, false) => c, // b > c > a (true, false, true) => a, // c <= a <= b (true, true, false) => b, // a <= b <= c (true, true, true) => b, // a == b == c } } When two of its arguments are constant, med3() should compile to the same code as a simple clamp(); but med3()’s misuse-resistance comes at a small cost when the arguments are not known at compile time. clamp in range If your language has proper range types, there is a nicer way to make clamp() resistant to misuse: fn clamp(x: f64, r: RangeInclusive<f64>) -> f64 { let (&min,&max) = (r.start(), r.end()); if x < min { return min } if max < x { return max } return x; } let x = clamp(x, MIN..=MAX); range style For a long time I have been fond of the idea of a simple counting for loop that matches the syntax of chained comparisons, like for min <= x <= max: ... By itself this is silly: too cute and too ad-hoc. I’m also dissatisfied with the range or slice syntax in basically every programming language I’ve seen. I thought it might be nice if the cute comparison and iteration syntaxes were aspects of a more generally useful range syntax, but I couldn’t make it work. Until recently when I realised I could make use of prefix or mixfix syntax, instead of confining myself to infix. So now my fantasy pet range syntax looks like >= min < max // half-open >= min <= max // inclusive And you might use it in a pattern match if x is >= min < max { // ... } Or as an iterator for x in >= min < max { // ... } Or to take a slice xs[>= min < max] style clash? It’s kind of ironic that these range examples don’t follow the left-to-right, lesser-to-greater rule of thumb that this post started off with. (x is not lexically between min and max!) But that rule of thumb is really intended for languages such as C that don’t have ranges. Careful stylistic conventions can help to avoid mistakes in nontrivial conditional expressions. It’s much better if language and library features reduce the need for nontrivial conditions and catch mistakes automatically.

2 months ago 37 votes

More in programming

If Apple cared about privacy

Defaults matter

21 hours ago 5 votes
first-class merges and cover letters

Although it looks really good, I have not yet tried the Jujutsu (jj) version control system, mainly because it’s not yet clearly superior to Magit. But I have been following jj discussions with great interest. One of the things that jj has not yet tackled is how to do better than git refs / branches / tags. As I underestand it, jj currently has something like Mercurial bookmarks, which are more like raw git ref plumbing than a high-level porcelain feature. In particular, jj lacks signed or annotated tags, and it doesn’t have branch names that always automatically refer to the tip. This is clearly a temporary state of affairs because jj is still incomplete and under development and these gaps are going to be filled. But the discussions have led me to think about how git’s branches are unsatisfactory, and what could be done to improve them. branch merge rebase squash fork cover letters previous branch workflow questions branch One of the huge improvements in git compared to Subversion was git’s support for merges. Subversion proudly advertised its support for lightweight branches, but a branch is not very useful if you can’t merge it: an un-mergeable branch is not a tool you can use to help with work-in-progress development. The point of this anecdote is to illustrate that rather than trying to make branches better, we should try to make merges better and branches will get better as a consequence. Let’s consider a few common workflows and how git makes them all unsatisfactory in various ways. Skip to cover letters and previous branch below where I eventually get to the point. merge A basic merge workflow is, create a feature branch hack, hack, review, hack, approve merge back to the trunk The main problem is when it comes to the merge, there may be conflicts due to concurrent work on the trunk. Git encourages you to resolve conflicts while creating the merge commit, which tends to bypass the normal review process. Git also gives you an ugly useless canned commit message for merges, that hides what you did to resolve the conflicts. If the feature branch is a linear record of the work then it can be cluttered with commits to address comments from reviewers and to fix mistakes. Some people like an accurate record of the history, but others prefer the repository to contain clean logical changes that will make sense in years to come, keeping the clutter in the code review system. rebase A rebase-oriented workflow deals with the problems of the merge workflow but introduces new problems. Primarily, rebasing is intended to produce a tidy logical commit history. And when a feature branch is rebased onto the trunk before it is merged, a simple fast-forward check makes it trivial to verify that the merge will be clean (whether it uses separate merge commit or directly fast-forwards the trunk). However, it’s hard to compare the state of the feature branch before and after the rebase. The current and previous tips of the branch (amongst other clutter) are recorded in the reflog of the person who did the rebase, but they can’t share their reflog. A force-push erases the previous branch from the server. Git forges sometimes make it possible to compare a branch before and after a rebase, but it’s usually very inconvenient, which makes it hard to see if review comments have been addressed. And a reviewer can’t fetch past versions of the branch from the server to review them locally. You can mitigate these problems by adding commits in --autosquash format, and delay rebasing until just before merge. However that reintroduces the problem of merge conflicts: if the autosquash doesn’t apply cleanly the branch should have another round of review to make sure the conflicts were resolved OK. squash When the trunk consists of a sequence of merge commits, the --first-parent log is very uninformative. A common way to make the history of the trunk more informative, and deal with the problems of cluttered feature branches and poor rebase support, is to squash the feature branch into a single commit on the trunk instead of mergeing. This encourages merge requests to be roughly the size of one commit, which is arguably a good thing. However, it can be uncomfortably confining for larger features, or cause extra busy-work co-ordinating changes across multiple merge requests. And squashed feature branches have the same merge conflict problem as rebase --autosquash. fork Feature branches can’t always be short-lived. In the past I have maintained local hacks that were used in production but were not (not yet?) suitable to submit upstream. I have tried keeping a stack of these local patches on a git branch that gets rebased onto each upstream release. With this setup the problem of reviewing successive versions of a merge request becomes the bigger problem of keeping track of how the stack of patches evolved over longer periods of time. cover letters Cover letters are common in the email patch workflow that predates git, and they are supported by git format-patch. Github and other forges have a webby version of the cover letter: the message that starts off a pull request or merge request. In git, cover letters are second-class citizens: they aren’t stored in the repository. But many of the problems I outlined above have neat solutions if cover letters become first-class citizens, with a Jujutsu twist. A first-class cover letter starts off as a prototype for a merge request, and becomes the eventual merge commit. Instead of unhelpful auto-generated merge commits, you get helpful and informative messages. No extra work is needed since we’re already writing cover letters. Good merge commit messages make good --first-parent logs. The cover letter subject line works as a branch name. No more need to invent filename-compatible branch names! Jujutsu doesn’t make you name branches, giving them random names instead. It shows the subject line of the topmost commit as a reminder of what the branch is for. If there’s an explicit cover letter the subject line will be a better summary of the branch as a whole. I often find the last commit on a branch is some post-feature cleanup, and that kind of commit has a subject line that is never a good summary of its feature branch. As a prototype for the merge commit, the cover letter can contain the resolution of all the merge conflicts in a way that can be shared and reviewed. In Jujutsu, where conflicts are first class, the cover letter commit can contain unresolved conflicts: you don’t have to clean them up when creating the merge, you can leave that job until later. If you can share a prototype of your merge commit, then it becomes possible for your collaborators to review any merge conflicts and how you resolved them. To distinguish a cover letter from a merge commit object, a cover letter object has a “target” header which is a special kind of parent header. A cover letter also has a normal parent commit header that refers to earlier commits in the feature branch. The target is what will become the first parent of the eventual merge commit. previous branch The other ingredient is to add a “previous branch” header, another special kind of parent commit header. The previous branch header refers to an older version of the cover letter and, transitively, an older version of the whole feature branch. Typically the previous branch header will match the last shared version of the branch, i.e. the commit hash of the server’s copy of the feature branch. The previous branch header isn’t changed during normal work on the feature branch. As the branch is revised and rebased, the commit hash of the cover letter will change fairly frequently. These changes are recorded in git’s reflog or jj’s oplog, but not in the “previous branch” chain. You can use the previous branch chain to examine diffs between versions of the feature branch as a whole. If commits have Gerrit-style or jj-style change-IDs then it’s fairly easy to find and compare previous versions of an individual commit. The previous branch header supports interdiff code review, or allows you to retain past iterations of a patch series. workflow Here are some sketchy notes on how these features might work in practice. One way to use cover letters is jj-style, where it’s convenient to edit commits that aren’t at the tip of a branch, and easy to reshuffle commits so that a branch has a deliberate narrative. When you create a new feature branch, it starts off as an empty cover letter with both target and parent pointing at the same commit. Alternatively, you might start a branch ad hoc, and later cap it with a cover letter. If this is a small change and rebase + fast-forward is allowed, you can edit the “cover letter” to contain the whole change. Otherwise, you can hack on the branch any which way. Shuffle the commits that should be part of the merge request so that they occur before the cover letter, and edit the cover letter to summarize the preceding commits. When you first push the branch, there’s (still) no need to give it a name: the server can see that this is (probably) going to be a new merge request because the top commit has a target branch and its change-ID doesn’t match an existing merge request. Also when you push, your client automatically creates a new instance of your cover letter, adding a “previous branch” header to indicate that the old version was shared. The commits on the branch that were pushed are now immutable; rebases and edits affect the new version of the branch. During review there will typically be multiple iterations of the branch to address feedback. The chain of previous branch headers allows reviewers to see how commits were changed to address feedback, interdiff style. The branch can be merged when the target header matches the current trunk and there are no conflicts left to resolve. When the time comes to merge the branch, there are several options: For a merge workflow, the cover letter is used to make a new commit on the trunk, changing the target header into the first parent commit, and dropping the previous branch header. Or, if you like to preserve more history, the previous branch chain can be retained. Or you can drop the cover letter and fast foward the branch on to the trunk. Or you can squash the branch on to the trunk, using the cover letter as the commit message. questions This is a fairly rough idea: I’m sure that some of the details won’t work in practice without a lot of careful work on compatibility and deployability. Do the new commit headers (“target” and “previous branch”) need to be headers? What are the compatibility issues with adding new headers that refer to other commits? How would a server handle a push of an unnamed branch? How could someone else pull a copy of it? How feasible is it to use cover letter subject lines instead of branch names? The previous branch header is doing a similar job to a remote tracking branch. Is there an opportunity to simplify how we keep a local cache of the server state? Despite all that, I think something along these lines could make branches / reviews / reworks / merges less awkward. How you merge should me a matter of your project’s preferred style, without interference from technical limitations that force you to trade off one annoyance against another. There remains a non-technical limitation: I have assumed that contributors are comfortable enough with version control to use a history-editing workflow effectively. I’ve lost all perspective on how hard this is for a newbie to learn; I expect (or hope?) jj makes it much easier than git rebase.

7 hours ago 3 votes
Many Hard Leetcode Problems are Easy Constraint Problems

In my first interview out of college I was asked the change counter problem: Given a set of coin denominations, find the minimum number of coins required to make change for a given number. IE for USA coinage and 37 cents, the minimum number is four (quarter, dime, 2 pennies). I implemented the simple greedy algorithm and immediately fell into the trap of the question: the greedy algorithm only works for "well-behaved" denominations. If the coin values were [10, 9, 1], then making 37 cents would take 10 coins in the greedy algorithm but only 4 coins optimally (10+9+9+9). The "smart" answer is to use a dynamic programming algorithm, which I didn't know how to do. So I failed the interview. But you only need dynamic programming if you're writing your own algorithm. It's really easy if you throw it into a constraint solver like MiniZinc and call it a day. int: total; array[int] of int: values = [10, 9, 1]; array[index_set(values)] of var 0..: coins; constraint sum (c in index_set(coins)) (coins[c] * values[c]) == total; solve minimize sum(coins); You can try this online here. It'll give you a prompt to put in total and then give you successively-better solutions: coins = [0, 0, 37]; ---------- coins = [0, 1, 28]; ---------- coins = [0, 2, 19]; ---------- coins = [0, 3, 10]; ---------- coins = [0, 4, 1]; ---------- coins = [1, 3, 0]; ---------- Lots of similar interview questions are this kind of mathematical optimization problem, where we have to find the maximum or minimum of a function corresponding to constraints. They're hard in programming languages because programming languages are too low-level. They are also exactly the problems that constraint solvers were designed to solve. Hard leetcode problems are easy constraint problems.1 Here I'm using MiniZinc, but you could just as easily use Z3 or OR-Tools or whatever your favorite generalized solver is. More examples This was a question in a different interview (which I thankfully passed): Given a list of stock prices through the day, find maximum profit you can get by buying one stock and selling one stock later. It's easy to do in O(n^2) time, or if you are clever, you can do it in O(n). Or you could be not clever at all and just write it as a constraint problem: array[int] of int: prices = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; var int: buy; var int: sell; var int: profit = prices[sell] - prices[buy]; constraint sell > buy; constraint profit > 0; solve maximize profit; Reminder, link to trying it online here. While working at that job, one interview question we tested out was: Given a list, determine if three numbers in that list can be added or subtracted to give 0? This is a satisfaction problem, not a constraint problem: we don't need the "best answer", any answer will do. We eventually decided against it for being too tricky for the engineers we were targeting. But it's not tricky in a solver; include "globals.mzn"; array[int] of int: numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; array[index_set(numbers)] of var {0, -1, 1}: choices; constraint sum(n in index_set(numbers)) (numbers[n] * choices[n]) = 0; constraint count(choices, -1) + count(choices, 1) = 3; solve satisfy; Okay, one last one, a problem I saw last year at Chipy AlgoSIG. Basically they pick some leetcode problems and we all do them. I failed to solve this one: Given an array of integers heights representing the histogram's bar height where the width of each bar is 1, return the area of the largest rectangle in the histogram. The "proper" solution is a tricky thing involving tracking lots of bookkeeping states, which you can completely bypass by expressing it as constraints: array[int] of int: numbers = [2,1,5,6,2,3]; var 1..length(numbers): x; var 1..length(numbers): dx; var 1..: y; constraint x + dx <= length(numbers); constraint forall (i in x..(x+dx)) (y <= numbers[i]); var int: area = (dx+1)*y; solve maximize area; output ["(\(x)->\(x+dx))*\(y) = \(area)"] There's even a way to automatically visualize the solution (using vis_geost_2d), but I didn't feel like figuring it out in time for the newsletter. Is this better? Now if I actually brought these questions to an interview the interviewee could ruin my day by asking "what's the runtime complexity?" Constraint solvers runtimes are unpredictable and almost always than an ideal bespoke algorithm because they are more expressive, in what I refer to as the capability/tractability tradeoff. But even so, they'll do way better than a bad bespoke algorithm, and I'm not experienced enough in handwriting algorithms to consistently beat a solver. The real advantage of solvers, though, is how well they handle new constraints. Take the stock picking problem above. I can write an O(n²) algorithm in a few minutes and the O(n) algorithm if you give me some time to think. Now change the problem to Maximize the profit by buying and selling up to max_sales stocks, but you can only buy or sell one stock at a given time and you can only hold up to max_hold stocks at a time? That's a way harder problem to write even an inefficient algorithm for! While the constraint problem is only a tiny bit more complicated: include "globals.mzn"; int: max_sales = 3; int: max_hold = 2; array[int] of int: prices = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; array [1..max_sales] of var int: buy; array [1..max_sales] of var int: sell; array [index_set(prices)] of var 0..max_hold: stocks_held; var int: profit = sum(s in 1..max_sales) (prices[sell[s]] - prices[buy[s]]); constraint forall (s in 1..max_sales) (sell[s] > buy[s]); constraint profit > 0; constraint forall(i in index_set(prices)) (stocks_held[i] = (count(s in 1..max_sales) (buy[s] <= i) - count(s in 1..max_sales) (sell[s] <= i))); constraint alldifferent(buy ++ sell); solve maximize profit; output ["buy at \(buy)\n", "sell at \(sell)\n", "for \(profit)"]; Most constraint solving examples online are puzzles, like Sudoku or "SEND + MORE = MONEY". Solving leetcode problems would be a more interesting demonstration. And you get more interesting opportunities to teach optimizations, like symmetry breaking. Because my dad will email me if I don't explain this: "leetcode" is slang for "tricky algorithmic interview questions that have little-to-no relevance in the actual job you're interviewing for." It's from leetcode.com. ↩

19 hours ago 3 votes
ARM is great, ARM is terrible (and so is RISC-V)

I’ve long been interested in new and different platforms. I ran Debian on an Alpha back in the late 1990s and was part of the Alpha port team; then I helped bootstrap Debian on amd64. I’ve got somewhere around 8 Raspberry Pi devices in active use right now, and the free NNCPNET Internet email service … Continue reading ARM is great, ARM is terrible (and so is RISC-V) →

19 hours ago 2 votes
Stumbling upon

Something like a channel changer, for the web. That's what the idea was at first. But it led to a whole new path of discovery that even the site's creators couldn't have predicted. The post Stumbling upon appeared first on The History of the Web.

2 days ago 9 votes