How I Learned to Stop Worrying and Love the Kernel

~wicrum-wicrun recently left the Urbit Foundation to work on Plunder, a competing Urbit-like system. He wrote this well-written essay describing his reasoning, which is worth a read. This essay is my response, to him, the Plunder project, and in general why I continue to favor Urbit over any competitor I've encountered.

First, some history. I started working on Urbit in 2017. I spent 2018 writing "Ford Turbo", a rewrite of Urbit's linker and build system, with ~littel-ponnys. I had a great time. I learned a lot from ~littel-ponnys. He had previously been at Google, where he had worked on Google Chrome for ten years, and he had also been involved in writing the interprocess communication system for Google's Zircon microkernel. ~littel-ponnys and I also share a taste for synthwave music and Angel's Envy rye whiskey, to which he introduced me. He's a good guy.

We released Ford Turbo near the end of 2018. Shortly after that, ~littel-ponnys lost faith in the Urbit project. He began questioning the design of the system: everything from the Nock machine code, to the Arvo kernel, to the event log persistence model. @sol_plunder, another Tlon colleague, also had critiques and a number of ideas. I have since adapted one of @sol_plunder's ideas, a "shared event log", into Urbit's "solid-state subscription" system, which ~wicrum-wicrun implemented earlier this year -- I don't even know if ~wicrum knew that idea originally came from @sol_plunder.

~littel-ponnys lost his faith during a meeting where Curtis Yarvin, as he was resigning from the project, presented an idea for error handling in the kernel. ~littel-ponnys came away from the meeting thinking that the idea, and therefore maybe Curtis in general, was nuts. This got him wondering if any of the system really made sense -- or was Curtis just good at writing in a way that was inspiring, and creative enough to identify the need for new computational foundations to run personal servers but not rational and measured enough to do a good job at the details of the system architecture.

I disagreed with ~littel-ponnys about the error handling proposal at the time -- I continue to think it's one of a few viable error handling designs -- and I've come to disagree with some of ~littel-ponnys's other critiques of the rest of Urbit, to varying degrees. On a personal level, it's a strange experience for me to have such a strong technical disagreement with someone for whom I have as much respect as I have for ~littel-ponnys.

I was sympathetic to ~littel-ponnys's overall concern, though. I had not done much systems programming before Urbit, so I couldn't say I knew for sure how to compare Urbit to other systems. I'd done a bit of systems programming -- a microcontroller communication protocol, a distributed map-reduce pipeline for n-dimensional arrays, a couple other things here and there -- but I was certainly not an expert in operating systems or programming languages. For the next two years, I spent quite a bit of time with both @sol_plunder and ~littel-ponnys investigating the various layers of Urbit, studying the theory and prior art of languages and operating systems, and collaborating on alternative designs. Other Urbit core devs, especially Philip Monk and Joe Bryan, also engaged in discussion about these subjects during that time.

I'm glad I went as deep as I did in my own investigations. Not only do I feel much more confident in my understanding of computer science after having spent that much more time studying operating systems and programming language theory, neither of which were part of my formal education, but I also have a good sense of where the Plunder team is coming from, and I feel comfortable in my position about their system, which is: it's interesting and has potential, but when taken as a whole I'm not convinced it's better than Nock, Hoon, and Arvo.

During those two years I looked a lot into the different strains of the typed lambda calculus, which is the most common academic approach to functional languages. One of the first things I learned is that Hoon's type system is roughly as mathematically powerful as Haskell's, although some things are different; unlike most languages, which are extensional, Hoon is intensional, so Hoon loses parametricity but can introspect values (although Haskell also loses parametricity because of the seq operation; if you ever need to make a Haskeller angry, tell them "Hask is not a category"). Both languages occupy the "Lambda-Omega" position, also called "F-Omega", in Berendregt's Lambda Cube. This means both languages can have both terms and types that depend on types. Hoon's approach to this -- the widely and somewhat justifiably reviled "wetness" feature -- is unique among programming languages (although the Zig language's "comptime" compile-time evaluation has a similar mechanism) and to my knowledge has never been studied in formal programming language research.

Of course, neither Hoon nor Haskell can be represented using the pure F-omega, which is strongly normalizing (i.e. all functions finish and return answers) and therefore not Turing-complete. Modeling loops in Haskell or Hoon requires "enrichment" of the basic lambda calculus formalism with "mu-recursion". This enrichment breaks some of the nice mathematical properties of the language, though, and hey, do we really need Turing-completeness in every piece of code run by an Urbit-like system?

For a while I thought maybe an Urbit should be built out of F-Omega that's unenriched with recursion, since despite not being Turing-complete, it turns out that it is capable of self-interpretation, also known as metacircular evaluation, which both Urbit and Plunder recognize as necessary for a universal computational foundation language. In my opinion, the paper that defines a self-interpreter for F-Omega should be considered a landmark paper and deserves much more attention than it's received.

A strongly normalizing language designed to be practically analyzed for upper bounds on memory and CPU usage could form the basis of an Urbit-like system that could also easily be run as a hard-real-time operating system, but instead of relying on the kernel unceremoniously killing a userspace process as it exceeds its timeslot, the kernel could reject starting that process in the first place, based on the results of static analysis. Isn't Turing completeness a form of cancer, really? It's why Ethereum needs gas, whose high price almost killed the Urbit project in 2020. George Hotz is right to remove Turing completeness from neural nets, in my opinion.

What would excite me more than Plunder would be an RTOS with a strongly normalizing language for userspace. I certainly hope Urbit would be able make use of it. I think it could.

There's a good argument, though, that rather than using a typed lambda calculus, the lowest layer should be untyped and types should only exist above that layer. Both Urbit and Plunder agree on this, and Urbit does keep Hoon types out of the Nock layer -- the Urbit runtime doesn't touch Hoon types or think about them at all.

Another dissenting position during those years at Tlon was that Hoon's type system should be rebuilt to use dependent types. I actually provisionally agree with this one: Hoon is already so close to being dependently typed that leaning into that might make sense. Not a high priority, but I'd love to see a self-hosted compiler for a dependently typed Hoonish language.

Developer ergonomics is one reason to try this, but another is its potential impact on the system design: the way the kernel runs Hoon code is dynamically typed, in such a way that it's effectively dependently typed (consider the metaprogramming applied to userspace mark definition files, which code-gens a core containing gates whose result types are based on the value of the +grad arm, not the type of that arm -- that's dependent typing), but if userspace code does that, the resulting types cannot be trusted by the kernel, and a dependently typed language might be able to propagate those type proofs from userspace into kernelspace (there are also other ways to achieve this if we need it, though, and I'm not sure we will need it).

(Plunder's high-level language is not dependently typed, but I imagine you could compile Idris to Plan.)

The calculus of constructions is another interesting point in the functional language design space. The "Morte" language was a fascinating experiment in using Boehm-Berarducci encoding to enable list operations in a strongly normalizing language. This led the author to develop the production-level Dahl configuration language, which is also strongly normalizing. I still think trying to jet something like Morte would be worth a shot -- although there's also the calculus of inductive constructions, and the calculus of coinductive constructions, and maybe someday soon the co-calculus of coinductive coconuts, so who knows where you would draw the line.

I spent a fun week in Istanbul with @sol_plunder in late 2019, and if I remember right, we worked a bit on "conq", an earlier Nock alternative language that he was working on at the time. I also recall suggesting a little earlier that we should look into a different model of laying out code in Hoon so that everything would be inlined by the compiler unless it was a closure. I still think that idea might hvae merit, and ~fodwyt-ragful's experimental jet system for Nock is at least somewhat similar. I bring this up to add some specifics to the idea that I know the Plunder guys pretty well.

Conq was essentially just a concatenative version of Nock. Concatenative languages have their own merits: Forth is the classic example, and Joy is a pure-functional concatenative language. As Joy's author points out, Joy programs have a homomorphism from a syntactic monoid to a semantic monoid. David Barbour's Awelon project specs out a somewhat Urbit-like system using a concatenative language.

The monoid stuff sounds like mumbo-jumbo, but it would enable a class of optimization where an input stream could be reduced and partially evaluated before it enters the system, which I still think about sometimes -- although despite that, I think factoring an Urbit-like system as a stream of functions operating on an inert state that's the argument is inferior to Urbit's "subject-oriented" design, where your OS is the function and inputs are streamed to it. Urbit's version of this system -- as is true for just about all of Urbit's subsystems when compared to alternatives, including Plunder -- is more data-oriented, and in the long run this is one of its saving graces. It's often a subtle difference, but the Urbit mindset includes the principle that if you have a choice, data is always better than code.

I don't remember why @sol_plunder abandoned conq. He kept churning out Nock-alternative languages while he worked at Tlon, and Plan does look like it's better than any of those by far.

On the systems side, in 2019 I helped ~littel-ponnys sketch out an Arvo alternative called "Nuevo", which was an attempt to create a deterministic multicore system, based on learnings from the exokernel and Barrelfish operating system projects. Those OS's address common failures in modern OS's.

The exokernel project gets the kernel out of the "data plane" and firmly into the "control plane", meaning the kernel enforces permissions for I/O but without all those bytes having to flow through the kernel. Bytes flowing through the kernel forces processor context switches and is too slow for extreme high-bandwidth networking applications, which often end up moving into kernelspace to sidestep this issue.

Barrelfish removed a multiprocessing bottleneck from the OS by effectively turning the kernel into a distributed system where cores could only communicate with each other using message passing (instead of shared memory). This gets important, since Amdahl's law kills your parallelism gains if you have any single-threaded bottlenecks once you exceed a few cores.

Enforcing determinism (to be precise, a "serializable history") to a tree of processes on different cores is nontrivial, but it could be done, and probably efficiently with a trick or two; ~littel-ponnys and I had a whiteboard sketch of it back then. Some other ideas were thrown around at Tlon in 2019, such as building a software-transactional memory system by dividing the Urbit state up by paths, so that optimistic multithreading could be done as an optimization by the runtime -- if all a piece of userspace code can do is read from and write to data stored at paths, the runtime could run a nominally later app activation at the same time as an earlier one, as long as it can roll back any changes if that second activation read from any paths that the first activation wrote to, just like a "hazard" system in a CPU.

None of these things are easy to build, though, and to their credit, Plunder does not incorporate anything quite so exotic. My biggest critique of Plunder is the same factual claim as ~wicrum-wicrun's critique of Urbit, but with opposing valence: Plunder is not ambitious enough. In order to standardize computing, a good language and interpreter is not enough. We need a better operating system.

Plunder's language VM is likely faster than Urbit's, at least for the next year or two, and the system is almost certainly easier to hack on, since you don't have to deal with the Arvo kernel -- well, you don't have to deal with your own kernel anyway, since Plunder is best thought of as a particularly nice way of building deterministic Unix processes. I suspect it really will be quite nice, but it's not an Urbit. I expect future Urbit competitors to aim lower by omitting a kernel too -- as Ronnie Coleman said, "everybody wants to be a bodybuilder, but don't nobody wanna lift no heavy-ass weight".

What happens without a kernel? Well, now you have lots of different processes that have to be managed somehow, ultimately by a user, who one would hope has better things to do. These processes have to have some way to find each other. Their identities need to be correlated over the network using some kind of public key infrastructure. Communication between the processes is unreliable, so idempotence and exactly-once delivery are difficult to achieve. Will Plunder solve these problems? I bet they will partially, by supplying opt-in libraries that enable some applications to deal with these things. Will they solve them better than Urbit has solved them? I doubt it, since most of the issues with Urbit come from taking a nice low-level design for a system and then trying to flesh out the higher layers.

I'll tell you a story with a moral. A bug in Urbit's over-the-air update system prevented us from deploying new kernel versions for the last half of 2022. Fixing the bug required rewriting the update system, which is why it took so long. The fix was to remove concurrency and asynchrony from the kernel modules involved in updates. After the rewrite, instead of back-and-forth communication for each application, the filesystem sends one large command to the application runner to tell it which applications to run.

Now all the applications can update in one atomic unit, in the same transaction during which the kernel updates. Since then we've pushed out lots of updates without issues with that part of the system. The moral of the story is this: Even within the same purely functional, deterministic, single-threaded system, race conditions and asynchronicity were causing so many bugs we had to rewrite the system to avoid them. If each of those applications were a separate Unix process, like in Plunder -- or anything lacking an Arvo -- establishing a transaction that's atomic among all of them would require way more work -- on the order of N^2, where N is the number of applications.

It is certainly true that if you don't try to solve this problem, you can focus your efforts on other problems, at least at first. You can even nitpick this example and say that without the kernel's presence you wouldn't need the applications to update in lockstep with it. But it's not even remotely the only problem of its kind. Ask anyone who's worked on microservices architectures how much fun it is to establish atomicity across multiple heterogeneous databases.

My prediction for Plunder is that working on its runtime, kernelspace, and initial applications will start off developing much faster than current Urbit, but once there are multiple applications deployed, with real users, which all need to push out updates and take advantage of new Plunder features as they come out, work will slow down to be slower than it is on Urbit.

For all its flaws, Urbit has done more to think through peer-to-peer versioning and migrations than anybody else. The runtime's new "epoch system" logs the runtime version of the first run to allow for principled fixes to jet mismatches; Nock can run other Nock code natively; the hoon language can compile and run other Hoon code natively; the Arvo kernel can hot-reload itself multiple times in the same event, while maintaining the call stacks of all its kernel modules; the runtime and kernel coordinate to ensure they're compatible with each other; the user has enough control to ensure that the kernel will only update if applications are ready for it; apps synchronize data in such a way that maximizes interoperability across protocol updates.

For a long time most people didn't understand why Urbit was built this way. But check out the excellent blog of Joe Duffy, who worked on Microsoft's Midori OS, which like Urbit was a language-based system. Despite their stellar team and a lot of interesting learnings from that project, they never figured out how to clean up userspace callback code, leading to space leaks they couldn't get rid of. Urbit has caught a lot of flak for Arvo's "duct system", which defunctionalizes (remember? data is better than code) the call stack of kernel modules into pieces of concrete data, printable as paths -- but it solves this problem.

Sometimes I still think about Urbit's kernel and runtime in terms I learned from those years of taking a critical eye to them. For instance, reducing the overhead of the Arvo kernel's outer dispatch loop can be thought of as "getting the kernel out of the data plane", at least partially, since ideally Arvo just passes a pointer to the data through itself to the runtime, without modifying the data itself or inducing context switches in the host OS. An Urbit unikernel could take this further.

I also think a fair amount about designing for parallelism in Urbit. Serving data through the Urbit network needs to scale elastically without the user whose post goes viral needing to configure anything. This is the guiding principle behind one of the largest ongoing kernelspace efforts, the "subscription reform" project. We'll know we're doing well with that once I can serve a piece of data from my ship that's running on a laptop, and my star's load-balanced horizontally scaled bank of scry cache servers handles the request load when it gets millions of views, without ever running Nock or doing a disk write.

In order for this to work, an information-theoretic perspective is required: the Arvo kernel must not ingest any new information in order to fulfill the requests for data. This is the way to guarantee it won't need to run any events, do any disk writes, or introduce any single-thread bottlenecks into the network. This is not fully fleshed out yet, but the basic scalable network protocol is now deployed and used in software distribution, with additions to support more use cases taking high priority within core dev.

~wicrum-wicrun wrote in his essay that Plunder is unopinionated, whereas Urbit is opinionated. I consider this a gross overstatement: Plunder is a purely functional system with a hard-coded set of datatypes. ~wicrum: I regret to inform you that restricting all programs to pure functions constitutes an opinion, and a rather strong one at that.

A less opinionated take would be more like what George Hotz said we should do: use a deterministic subset of x86. Similarly, lots of people think Urbit should use WASM as its base language. Both fine ideas, and at least native code would play a much smaller performance price than Plunder, which is garbage-collected, so writing the next Call of Duty in it is more viable -- it would err almost entirely on the side of performance, whereas Urbit errs almost entirely on the side of simplicity. Plunder is somewhere in between, but much closer to Urbit, a point on the Pareto frontier that I'm not sure is wise.

In my opinion, neither of these options are legible enough to enable the kinds of full-stack optimizations that Urbit will eventually enable. That's right -- by a few decades from now, Urbit will be faster than other systems in addition to being more reliable and usable, because true simplicity wins out in the long run.

28 years since Angelina Jolie said "RISC architecture is gonna change everything", now phones and macbooks all run it. It's worth noting this experiment is being played out in front of us. Intel is struggling to improve their x86 ISA for speed and battery usage while ARM (RISC) is taking off. Imagine going even more reduced, to an instruction set with twelve opcodes. Nock is more reduced than Plunder, which can be thought of as a Nock augmented with a couple of performance-oriented doodads ("pin"s and arrays, roughly).

Another debate I've had over and over again the past few years has to do with trees vs. arrays. It came up in the Plunder context too. The debate is a bit of a trap. I can't tell you how many times I've heard "computers like linear memory, so why would you make trees the basic data structure". This misses several points. At big enough scale, everything becomes a tree again. It might be a B+ tree, or it might be some more specialized data structure if you're actually dealing with that kind of data. Plunder's claim that Urbit can't do data jets to address this is not true -- in fact, I think both systems end up with essentially the same problems with data jets if they implement them. We've just had much more pressing problems, such as pings from sponsees slowing down sponsors due to unnecessary statefulness and closing off denial of service attack vectors in Gall.

One odd thing about the Nock performance debate is that to date, the amount of money poured into making Nock go faster has been almost zero. Its bytecode interpreter was written by one guy, who isn't even a full-time employee, and that was enough of a win that we were able to delete tens of thousands of lines of jets. Almost all performance problems we've had in Urbit to date are what I call "stop hitting yourself" problems -- things like the runtime serializing a date to a string ten thousand times for no reason on every event, then throwing away the result. That actually happened a couple years ago. Fixing it brought CPU usage down to 4% on Urbit's busiest galaxy.

A less flippant answer is the Ares project, a cross-organizational effort from Zorp, Urbit Foundation, and Tlon that promises to allow Urbit to manage terabytes of data, not just gigabytes. If I had a time machine and a gun, I would go back to 2008 and hold Curtis at gunpoint until he added an #ifdef NOUN_64_BIT to the Urbit codebase. It's annoyingly easy to forget that Urbit's current storage limitation is absolutely not a fundamental design constraint, and it absolutely will go away.

Ares also directly addresses @sol_plunder's original complaint about Nock that he cited as the reason for leaving Tlon: Nock function calls allocate cons cells, which is slow in Vere. Ares has a bump allocator instead of Vere's malloc-style allocator, making allocation almost free -- every language needs to put function arguments somewhere in memory, and a bump allocator makes the speed of that operation about the same as putting those variables in a stack frame like in C. Ares also registerizes Nock formulas, allowing Nock code to look up local variables in constant time rather than the naive logarithmic tree lookup time.

@sol_plunder's complaint is also a little misdirected: the slowest part of a Vere function call is not cell allocation, it's checking for a jet match. Ares performs this check when compiling Nock to machine code, eliminating runtime overhead entirely. Ares will also remove most "indirect jumps" that result from naive compilation of Nock function calls, allowing the processor to utilize its pipelining, cache prefetching, and branch prediction optimizations that only work with "direct jumps", i.e. machine code instructions that jump to a hard-coded memory address.

With these changes combined, Nock function calls should no longer be a performance bottleneck for Nock. I'm excited to see how much faster Ares will be.

Ares is the first step in Nock optimization. Maybe there's a performance ceiling, but after six years of working on this project, I've seen so many things that looked like ceilings get blown past that I'm not too worried about it. I've said for many years that if someone dumped the amount of money into making Nock go fast that the browser wars dumped into making JavaScript go fast, Nock will go significantly faster than JavaScript. The fact that you can build a processor that runs Nock also alleviates some performance concerns. A lot of the claims about Plunder's superiority to Nock are also overstated in my opinion: we can distribute Nock over the network if we want, the data storage story is comparable, and that array library that ~wicrum-wicrun mentioned has since been used to implement a neural network in Hoon, with jets coming soon.

What does worry me is the difficulty of working on "vane"s (Arvo kernel modules) and Gall agents (Urbit apps). Those are big, and related, problems that ~wicrum correctly identifies in his writeup. Working on Gall in particular seems to disillusion core devs, including ~wicrum himself, about Urbit more than any other part of the system, and I think I know why.

The code organization in Gall is a mess, the style is a mess, the invariants that must be upheld when working on it are subtle and numerous, the model of userspace code it provides is clunky, and its spec would avoid description if anyone tried, which nobody has. In other words, it's a big ball of mud. If all of Urbit looked like Gall, I would, I don't know, change my name and become an arms dealer in Micronesia or something. I certainly wouldn't keep working on Urbit.

In my estimation, this overconstrained, ball-of-mud feeling comes not from the Arvo kernel, Hoon, or Nock (the Arvo kernel proper is just about frozen; the only modification I want to make to it is breadth-first move ordering, which will be very nice). Rather, it comes from Gall and Clay, and to a lesser extent the interactions of those systems with Ames and Eyre.

These vanes got big and nasty because they've been used in production for years. What we'll need to do is keep the hard-earned lessons from that use and boil the vanes down to their essentials, to something onto which I could imagine slapping a Kelvin version without holding my nose. It's not an easy project, but it must be done, and quickly.

Urbit development will not stop anytime soon. It could easily be a hundred years before we get near Kelvin zero. We don't have infinite time to fix its problems, though. Investors have been getting impatient. App development shops have been looking into alternatives.

The ugly truth is that the status of Urbit's core system is the limiting factor in getting over the hump of Urbit becoming usable in a scalable, production-level, security-critical, manner. We have our jobs cut out for us.

So does Plunder, so good luck to them. May the best personal server OS win.