Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
15
Original concept circa 2008, revisited 2012, hardware designed Feb 2014, firmware designed April-August 2014, project completed August 2014, installed in spare room in 2016, written up in March 2018 (jeeez…) Acrylic, LEDs, ARM Cortex-M0 microcontroller. Colourclock Idea This has been a long-running project. My notebooks contain sketches of the idea from about 10 years back but I finally made PCBs at the beginning of 2014, finishing construction and firmware late 2014 – and it’s taken me about 4 years to write it up. The basic concept is 60 radial RGB LEDs in a circle, with light and colour representing analogue ‘hands’. Where the hands cross, pleasing colours result. Joy. There is really nothing like the solid punchy colours of RGB LEDs! Except, maybe, lasers. So if you’re colourblind, this might suck for you. Don’t build one. Sorry. If not, build one! Design concept The design is visually simple. It consists of: A ring-shaped PCB, with LEDs mounted around its...
over a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from axio.ms

MicroMac, a Macintosh for under £5

A microcontroller Macintosh This all started from a conversation about the RP2040 MCU, and building a simple desktop/GUI for it. I’d made a comment along the lines of “or, just run some old OS”, and it got me thinking about the original Macintosh. The original Macintosh was released 40.5 years before this post, and is a pretty cool machine especially considering that the hardware is very simple. Insanely Great and folklore.org are fun reads, and give a glimpse into the Macintosh’s development. Memory was a squeeze; the original 128KB version was underpowered and only sold for a few months before being replaced by the Macintosh 512K, arguably a more appropriate amount of memory. But, the 128 still runs some real applications and, though it pre-dates MultiFinder/actual multitasking, I found it pretty charming. As a tourist. In 1984 the Mac cost roughly 1/3 as much as a VW Golf and, as someone who’s into old computers and old cars, it’s hard to decide which is more frustrating to use. So back to this £3.80 RPi Pico microcontroller board: The RP2040’s 264KB of RAM gives a lot to play with after carving out the Mac’s 128KB – how cool would it be to do a quick hack, and play with a Mac on it? Time passes. A lot of time. But I totally delivered on the janky hack front: You won’t believe that this quality item didn’t take that long to build. So the software was obviously the involved part, and turned into work on 3 distinct projects. This post is going to be a “development journey” story, as a kind of code/design/venting narrative. If you’re just here for the pictures, scroll along! What is pico-mac? A Raspberry Pi RP2040 microcontroller (on a Pico board), driving monochrome VGA video and taking USB keyboard/mouse input, emulating a Macintosh 128K computer and disc storage. The RP2040 has easily enough RAM to house the Mac’s memory, plus that of the emulator; it’s fast enough (with some tricks) to meet the performance of the real machine, has USB host capability, and the PIO department makes driving VGA video fairly uneventful (with some tricks). The basic Pico board’s 2MB of flash is plenty for a disc image with OS and software. Here’s the Pico MicroMac in action, ready for the paperless office of the future: The Pico MicroMac RISC CISC workstation of the future I hadn’t really used a Mac 128K much before; a few clicks on a museum machine once. But I knew they ran MacDraw, and MacWrite, and MacPaint. All three of these applications are pretty cool for a 128K machine; a largely WYSIWYG word processor with multiple fonts, and a vector drawing package. A great way of playing with early Macintosh system software, and applications of these wonderful machines is via https://infinitemac.org, which has shrinkwrapped running the Mini vMac emulator by emscriptening it to run in the browser. Highly recommended, lots to play with. As a spoiler, MicroMac does run MacDraw, and it was great to play with it on “real fake hardware”: (Do you find “Pico Micro Mac” doesn’t really scan? I didn’t think this taxonomy through, did I?) GitHub links are at the bottom of this page: the pico-mac repo has construction directions if you want to build your own! The journey Back up a bit. I wasn’t committed to building a Pico thing, but was vaguely interested in whether it was feasible, so started tinkering with building a Mac 128K emulator on my normal computer first. The three rules I had a few simple rules for this project: It had to be fun. It’s OK to hack stuff to get it working, it’s not as though I’m being paid for this. I like writing emulation stuff, but I really don’t want to learn 68K assembler, or much about the 68K. There’s a lot of love for 68K out there and that’s cool, but meh I don’t adore it as a CPU. So, right from the outset I wanted to use someone else’s 68K interpreter – I knew there were loads around. Similarly, there are a load of OSes whose innards I’d like to learn more about, but the shittiest early Mac System software isn’t high on the list. Get in there, emulate the hardware, boot the OS as a black box, done. I ended up breaking 2 of and sometimes all 3 of these rules during this project. The Mac 128K The machines are generally pretty simple, and of their time. I started with schematics and Inside Macintosh, PDFs of which covered various details of the original Mac hardware, memory map, mouse/keyboard, etc. https://tinkerdifferent.com/resources/macintosh-128k-512k-schematics.79/ https://vintageapple.org/inside_o/ Inside Macintosh Volumes I-III are particularly useful for hardware information; also Guide to Macintosh Family Hardware 2nd Edition. The Macintosh has: A Motorola 68000 CPU running at 7.whatever MHz roughly 8MHz Flat memory, decoded into regions for memory-mapped IO going to the 6522 VIA, the 8530 SCC, and the IWM floppy controller. (Some of the address decoding is a little funky, though.) Keyboard and mouse hang off the VIA/SCC chips. No external interrupt controller: the 68K has 3 IRQ lines, and there are 3 IRQ sources (VIA, SCC, programmer switch/NMI). “No slots” or expansion cards. No DMA controller: a simple autonomous PAL state machine scans video (and audio samples) out of DRAM. Video is fixed at 512x342 1BPP. The only storage is an internal FDD (plus an external drive), driven by the IWM chip. The first three Mac models are extremely similar: The Mac 128K and Mac 512K are the same machine, except for RAM. The Mac Plus added SCSI to a convenient space in the memory map and an 800K floppy drive, which is double-sided whereas the original was a single 400K side. The Mac Plus ROM also supports the 128K/512K, and was an upgrade to create the Macintosh 512Ke. ‘e’ for Extra ROM Goodness. The Mac Plus ROM supports the HD20 external hard disc, and HFS, and Steve Chamberlin has annotated a disassembly of it. This was the ROM to use: I was making a Macintosh 128Ke. Mac emulator: umac After about 8 minutes of research, I chose the Musashi 68K interpreter. It’s C, simple to interface to, and had a simple out-of-box example of a 68K system with RAM, ROM, and some IO. Musashi is structured to be embedded in bigger projects: wire in memory read/write callbacks, a function to raise an IRQ, call execute in a loop, done. I started building an emulator around it, which ultimately became the umac project. The first half (of, say, five halves) went pretty well: A simple commandline app loading the ROM image, allocating RAM, providing debug messages/assertions/logging, and configuring Musashi. Add address decoding: CPU reads/writes are steered to RAM, or ROM. The “overlay” register lets the ROM boot at 0x00000000 and then trampoline up to a high ROM mirror after setting up CPU exception vectors – this affects the address decoding. This is done by poking a VIA register, so decoded just that bit of that register for now. At this point, the ROM starts running and accessing more non-existent VIA and SCC registers. Added more decoding and a skeleton for emulating these devices elsewhere – the MMIO read/writes are just stubbed out. There are some magic addresses that the ROM accesses that “miss” documented devices: there’s a manufacturing test option that probes for a plugin (just thunk it), and then we witness the RAM size probing. The Mac Plus ROM is looking for up to 4MB of RAM. In the large region devoted to RAM, the smaller amount of actual RAM is mirrored over and over, so the probe writes a magic value at high addresses and spots where it starts to wrap around. RAM is then initialised and filled with a known pattern. This was an exciting point to get to because I could dump the RAM, convert the region used for the video framebuffer into an image, and see the “diagonal stripe” pattern used for RAM testing! “She’s alive!” Not all of the device code enjoyed reading all zeroes, so there was a certain amount of referring to the disassembly and returning, uh, 0xffffffff sometimes to push it further. The goal was to get it as far as accessing the IWM chip, i.e. trying to load the OS. After seeing some IWM accesses there and returning random rubbish values, the first wonderful moment was getting the “Unknown Disc” icon with the question mark – real graphics! The ROM was REALLY DOING SOMETHING! I think I hadn’t implemented any IRQs at this point, and found the ROM in an infinite loop: it was counting a few Vsyncs to delay the flashing question mark. Diversion into a better VIA, with callbacks for GPIO register read/write, and IRQ handling. This also needed to wire into Musashi’s IRQ functions. This was motivating to get to – remembering rule #1 – and “graphics”, even though via a manual memory dump/ImageMagick conversion, was great. I knew the IWM was an “interesting” chip, but didn’t know details. I planned to figure it out when I got there (rule #1). IWM, 68K, and disc drivers My god, I’m glad I put IWM off until this point. If I’d read the “datasheet” (vague register documentation) first, I’d’ve just gone to the pub instead of writing this shitty emulator. IWM is very clever, but very very low-level. The disc controllers in other contemporary machines, e.g. WD1770, abstract the disc physics. At one level, you can poke regs to step to track 17 and then ask the controller to grab sector 3. Not so with IWM: first, the discs are Constant Linear Velocity, meaning the angular rotation needs to change appropriate to whichever track you’re on, and second the IWM just gives the CPU a firehose of crap from the disc head (with minimal decoding). I spent a while reading through the disassembly of the ROM’s IWM driver (breaking rule #2 and rule #1): there’s some kind of servo control loop where the driver twiddles PWM values sent to a DAC to control the disc motor, measured against a VIA timer reference to do some sort of dynamic rate-matching to get the correct bitrate from the disc sectors. I think once it finds the track start it then streams the track into memory, and the driver decodes the symbols (more clever encoding) and selects the sector of interest. I was sad. Surely Basilisk II and Mini vMac etc. had solved this in some clever way – they emulated floppy discs. I learned they do not, and do the smart engineering thing instead: avoid the problem. The other emulators do quite a lot of ROM patching: the ROM isn’t run unmodified. You can argue that this then isn’t a perfect hardware emulation if you’re patching out inconvenient parts of the ROM, but so what. I suspect they were also abiding by a rule #1 too. I was going to do the same: I figured out a bit of how the Mac driver interface works (gah, rule #3!) and understood how the other emulators patched this. They use a custom paravirtualised 68K driver which is copied over the ROM’s IWM driver, servicing .Sony requests from the block layer and routing them to more convenient host-side code to manage the requests. Basilisk II uses some custom 68K opcodes and a simple driver, and Mini vMac a complex driver with trappy accesses to a custom region of memory. I reused the Basilisk II driver but converted to access a trappy region (easier to route: just emulate another device). The driver callbacks land in the host/C side and some cut-down Basilisk II code interprets the requests and copies data to/from the OS-provided buffers. Right now, all I needed was to read blocks from one disc: I didn’t need different formats (or even write support), or multiple drives, or ejecting/changing images. Getting the first block loaded from disc took waaaayyy longer than the first part. And, I’d had to learn a bit of 68K (gah), but just in the nick of time I got a Happy Mac icon as the System software started to load. This was still a simple Linux commandline application, with zero UI. No keyboard or mouse, no video. Time to wrap it in an SDL2 frontend (the unix_main test build in the umac project), and I could watch the screen redraw live. I hadn’t coded the 1Hz timer interrupt into the VIA, and after adding that it booted to a desktop! The first boot As an aside, I try to create a dual-target build for all my embedded projects, with a native host build for rapid prototyping/debugging; libSDL instead of an LCD. It means I don’t need to code at the MCU, so I can code in the garden. :) Next was mouse support. Inside Macintosh and the schematics show how it’s wired, to the VIA (good) and the SCC (a beast). The SCC is my second least-favourite chip in this machine; it’s complex and the datasheet/manual seems to be intentionally written to hide information, piss off readers, get one back at the world. (I didn’t go near the serial side, its main purpose, just external IRQ management. But, it’ll do all kinds of exciting 1980s line coding schemes, offloading bitty work from the CPU. It was key for supporting things like AppleTalk.) Life was almost complete at this point; with a working mouse I could build a new disc image (using Mini vMac, an exercise in itself) with Missile Command. This game is pretty fun for under 10KB on disc. So: Video works Boots from disc Mouse works, Missile Command I had no keyboard, but it’s largely working now. Time to start on sub-project numero due: Hardware and RP2040 Completely unrelated to umac, I built up a circuit and firmare with two goals: Display 512x342x1 video to VGA with minimal components, Get the TinyUSB HID example working and integrated. This would just display a test image copied to a framebuffer, and printf() keyboard/mouse events, as a PoC. The video portion was fun: I’d done some I2S audio PIO work before, but here I wanted to scan out video and arbitrarily control Vsync/Hsync. Well, to test I needed a circuit. VGA wants 0.7V max on the video R,G,B signals and (mumble, some volts) on the syncs. The R,G,B signals are 75Ω to ground: with some maths, a 3.3V GPIO driving all three through a 100Ω resistor is roughly right. The day I started soldering it together I needed a VGA connector. I had a DB15 but wanted it for another project, and felt bad about cutting up a VGA cable. But when I took a walk at lunchtime, no shitting you, I passed some street cables. I had a VGA cable – the rust helps with the janky aesthetic. Free VGA cable The VGA PIO side was pretty fun. It ended up as PIO reading config info dynamically to control Hsync width, display position, and so on, and then some tricks with DMA to scan out the config info interleaved with framebuffer data. By shifting the bits in the right direction and by using the byteswap option on the RP2040 DMA, the big-endian Mac framebuffer can be output directly without CPU-side copies or format conversion. Cool. This can be fairly easily re-used in other projects: see video.c. But. I ended up (re)writing the video side three times in total: First version had two DMA channels writing to the PIO TX FIFO. The first would transfer the config info, then trigger the second to transfer video data, then raise an IRQ. The IRQ handler would then have a short time (the FIFO depth!) to choose a new framebuffer address to read from, and reprogram DMA. It worked OK, but was highly sensitive to other activity in the system. First and most obvious fix is that any latency-sensitive IRQ handler must have the __not_in_flash_func() attribute so as to run out of RAM. But even with that, the design didn’t give much time to reconfigure the DMA: random glitches and blanks occurred when moving the mouse rapidly. Second version did double-buffering with the goal of making the IRQ handler’s job trivial: poke in a pre-prepared DMA config quickly, then after the critical rush calculate the buffer to use for next time. Lots better, but still some glitches under some high load. Even weirder, it’d sometimes just blank out completely, requiring a reset. This was puzzling for a while; I ended up printing out the PIO FIFO’s FDEBUG register to try to catch the bug in the act. I saw that the TXOVER overflow flag was set, and this should be impossible: the FIFOs pull data from DMA on demand with DMA requests and a credited flow-contr…OH WAIT. If credits get messed up or duplicated, too many transfers can happen, leading to an overflow at the receiver side. Well, I’d missed a subtle rule in the RP2040 DMA docs: Another caveat is that multiple channels should not be connected to the same DREQ. So the third version…… doesn’t break this rule, and is more complicated as a result: One DMA channel transfers to the PIO TX FIFO Another channel programs the first channel to send from the config data buffer A third channel programs the first to send the video data The programming of the first triggers the corresponding “next reprogram me” channel The nice thing – aside from no lock-ups or video corruption – is that this now triggers a Hsync IRQ during the video line scan-out, greatly relaxing the deadline of reconfiguring the DMA. I’d like to further improve this (with yet another DMA channel) to transfer without an IRQ per line, as the current IRQ overhead of about 1% of CPU time can be avoided. (It would’ve been simpler to just hardwire the VGA display timing in the PIO code, but I like (for future projects) being able to dynamically-reconfigure the video mode.) So now we have a platform and firmware framework to embed umac into, HID in and video out. The hardware’s done, fuggitthat’lldo, let’s throw it over to the software team: How it all works Back to emulating things A glance at the native umac binary showed a few things to fix before it could run on the Pico: Musashi constructed a huge opcode decode jumptable at runtime, in RAM. It’s never built differently, and never changes at runtime. I added a Musashi build-time generator so that this table could be const (and therefore live in flash). The disassembler was large, and not going to be used on the Pico, so another option to build without. Musashi tries to accurately count execution cycles for each instruction, with more large lookup tables. Maybe useful for console games, but the Mac doesn’t have the same degree of timing sensitivity. REMOVED. (This work is in my small-build branch.) pico-mac takes shape, with the ROM and disc image in flash, and enjoyably it now builds and runs on the Pico! With some careful attention to not shoving stuff in RAM, the RAM use is looking pretty good. The emulator plus HID code is using about 35-40KB on top of the Mac’s 128KB RAM area – there’s 95+KB of RAM still free. This was a good time to finish off adding the keyboard support to umac. The Mac keyboard is interfaced serially through the VIA ‘shift register’, a basic synchronous serial interface. This was logically simple, but frustrating because early attempts at replying to the ROM’s “init” command just were persistently ignored. The ROM disassembly was super-useful again: reading the keyboard init code, it looked like a race condition in interrupt acknowledgement if the response byte appears too soon after the request is sent. Shoved in a delay to hold off a reply until a later poll, and then it was just a matter of mapping keycodes (boooooorrrriiiiing). With a keyboard, the end-of-level MacWrite boss is reached: One problem though: it totally sucked. It was suuuuper slow. I added a 1Hz dump of instruction count, and it was doing about 300 KIPS. The 68000 isn’t an amazing CPU in terms of IPC. Okay, there are some instructions that execute in 4 cycles. But you want to use those extravagant addressing modes don’t you, and touching memory is spending those cycles all over the place. Not an expert, but targeting about 1 MIPS for an about 8MHz 68000 seems right. Only 3x improvement needed. Performance I didn’t say I wasn’t gonna cheat: let’s run that Pico at 250MHz instead of 125MHz. Okay better, but not 2x better. From memory, only about 30% better. Damn, no free lunch today. Musashi has a lot of configurable options. My first goal was to get its main loop (as seen from disassembly/post-compile end!) small: the Mac doesn’t report Bus Errors, so the registers don’t need copies for unwinding. The opcodes are always fetched from a 16b boundary, so don’t need alignment checking, and can use halfword loads (instead of two byte loads munged into a halfword!). For the Cortex-M0+/armv6m ISA, reordering some of the CPU context structure fields enabled immediate-offset access and better code. The CPU type, mysteriously, was dynamically-changeable and led to a bunch of runtime indirection. Looking better, maybe 2x improvement, but not enough. Missile Command was still janky and the mouse wasn’t smooth! Next, some naughty/dangerous optimisations: remove address alignment checking, because unaligned accesses don’t happen in this constrained environment. (Then, this work is in my umac-hacks branch.) But the real perf came from a different trick. First, a diversion! RP2040 memory access The RP2040 has fast RAM, which is multi-banked so as to allow generally single-cycle access to multiple users (2 CPUs, DMA, etc.). Out of the box, most code runs via XIP from external QSPI flash. The QSPI usually runs at the core clock (125MHz default), but has a latency of ~20 cycles for a random word read. The RP2040 uses a relatively simple 16KB cache in front of the flash to protect you from horrible access latency, but the more code you have the more likely you are to call a function and have to crank up QSPI. When overclocking to 250MHz, the QSPI can’t go that fast so stays at 125MHz (I think). Bear in mind, then, that your 20ish QSPI cycles on a miss become 40ish CPU cycles. The particular rock-and-a-hard-place here is that Musashi build-time generates a ton of code, a function for each of its 1968 opcodes, plus that 256KB opcode jumptable. Even if we make the inner execution loop completely free, the opcode dispatch might miss in the flash cache, and the opcode function itself too. (If we want to get 1 MIPS out of about 200 MIPS, a few of these delays are going to really add up.) The __not_in_flash_func() attribute can be used to copy a given function into RAM, guaranteeing fast execution. At the very minimum, the main loop and memory accessors are decorated: every instruction is going to access an opcode and most likely read or write RAM. This improves performance a few percent. Then, I tried decorating whole classes of opcodes: move is frequent, as are branches, so put ‘em in RAM. This helped a lot, but the remaining free RAM was used up very quickly, and I wasn’t at my goal of much above 1 MIPS. Remember that RISC architecture is gonna change everything? We want to put some of those 1968 68K opcodes into RAM to make them fast. What are the top 10 most often-used instructions? Top 100? By adding a 64K table of counters to umac, booting the Mac and running key applications (okay, playing Missile Command for a bit), we get a profile of dynamic instruction counts. It turns out that the 100 hottest opcodes (5% of the total) account for 89% of the execution. And the top 200 account for a whopping 98% of execution. Armed with this profile, the umac build post-processes the Musashi auto-generated code and decorates the top 200 functions with __not_in_flash_func(). This adds only 17KB of extra RAM usage (leaving 95KB spare), and hits about 1.4 MIPS! Party on! At last, the world can enjoy Missile Command’s dark subject matter in performant comfort: Missile Command on pico-mac What about MacPaint? Everyone loves MacPaint. Maybe you love MacPaint, and have noticed I’ve deftly avoided mentioning it. Okay, FINE: It doesn’t run on a Mac 128Ke, because the Mac Plus ROM uses more RAM than the original. :sad-face: I’d seen this thread on 68kMLA about a “Mac 256K”: https://68kmla.org/bb/index.php?threads/the-mythical-mac-256k.46149/ Chances are that the Mac 128K was really a Mac 256K in the lab (or maybe even intended to have 256K and cost-cut before release), as the OS functions fine with 256KB. I wondered, does the Mac ROM/OS need a power-of-two amount of RAM? If not, I have that 95K going spare. Could I make a “Mac 200K”, and then run precious MacPaint? Well, I tried a local hack that patches the ROM to update its global memTop variable based on a given memory size, and yes, System 3.2 is happy with non-power-of-2 sizes. I booted with 256K, 208K, and 192K. However, there were some additional problems to solve: the ROM memtest craps itself without a power-of-2 size (totally fair), and NOPping that out leads to other issues. These can be fixed, though also some parts of boot access off the end of RAM. A power-of-2 size means a cheap address mask wraps RAM accesses to the valid buffer, and that can’t be done with 192K. Unfortunately, when I then tested MacPaint it still wouldn’t run because it wanted to write a scratch file to the read-only boot volume. This is totally breaking rule #1 by this point, so we are staying with 128KB for now. However, a 256K MicroMac is extremely possible. We just need an MCU with, say, 300KB of RAM… Then we’d be cooking on gas. Goodbye, friend Well, dear reader, this has been a blast. I hope there’s been something fun here for ya. Ring off now, caller! The MicroMac! HDMI monitor, using a VGA-to-HDMI box umac screenshot System 3.2, Finder 5.3 Performance tuning Random disc image working OK Resources https://github.com/evansm7/umac https://github.com/evansm7/pico-mac https://www.macintoshrepository.org/7038-all-macintosh-roms-68k-ppc- https://winworldpc.com/product/mac-os-0-6/system-3x https://68kmla.org/bb/index.php?threads/macintosh-128k-mac-plus-roms.4006/ https://docs.google.com/spreadsheets/d/1wB2HnysPp63fezUzfgpk0JX_b7bXvmAg6-Dk7QDyKPY/edit#gid=840977089

10 months ago 15 votes
Classical virtualisation rules applied to RISC-style atomics

In 1974, Gerald Popek and Robert Goldberg published a paper, “Formal Requirements for Virtualizable Third Generation Architectures”, giving a set of characteristics for correct full-machine virtualisation. Today, these characteristics remain very useful. Computer architects will informally cite this paper when debating Instruction Set Architecture (ISA) developments, with arguments like “but that’s not Popek & Goldberg-compliant!” In this post I’m looking at one aspect of computer architecture evolution since 1974, and observing how RISC-style atomic operations provide some potential virtualisation gotchas for both programmers and architects. Principles of virtualisation First, some virtualisation context, because it’s fun! A key P&G requirement is that of equivalence: it’s reasonable to expect software running under virtualisation to have the same behaviour as running it bare-metal! This property is otherwise known as correctness. :-) P&G classify instructions as being sensitive if they behave differently when running at a lower privilege level (i.e. the program can detect that it is being run in a different manner). An ISA is said to be classically virtualisable if: Sensitive instructions are privileged, and Privileged instructions executed at a lower privilege level can be trapped to a higher level of privilege. For a classically-virtualisable system, perfect equivalence can then be achieved by running software at a lower than usual level of privilege, trapping all privileged/sensitive instructions, and emulating their behaviour in a VMM. That is, if the design of the ISA ensures that all “sensitive” instructions can be trapped, it’s possible to ensure the logical execution of the software cannot be different to running bare-metal. This virtualisation technique is called “privilege compression”. Note: This applies recursively, running OS-level software with user privilege, or hypervisor-level software at OS/user privilege. Popek & Goldberg formalise this too, giving properties required for correct nested virtualisation. System/360 and PowerPC are both classically virtualisable, almost as though IBM thought about this. ;-) Equivalent virtualisation can be achieved by: Running an OS in user mode (privilege compression, for CPU virtualisation), Catching traps (to supervisor mode/HV) when the guest OS performs a privileged operation, In the hypervisor, operating on a software-maintained “shadow” of what would have been the guest OS’s privileged CPU state were it running bare-metal. Constructing shadow address translations (for memory virtualisation). Linux’s KVM support on PowerPC includes a “PR” feature, which does just this: for CPUs without hardware virtualisation, guests are run in user mode (or “PRoblem state” in IBM lingo). Note: It is key that the hypervisor can observe and control all of the guest’s state. Today, most systems address the performance impact of all of this trap-and-emulate by providing hardware CPU and memory virtualisation (e.g. user, OS and hypervisor execution privilege levels, with nested page tables). But, classically virtualisable ISA design remains important for clear reasoning about isolation between privilege levels and composability of behaviours. Computers in 1974 were ~all CISC All computers in 1974 were available in corduroy with a selection of Liberty-print input devices. All consoles had ashtrays (not even joking tbh). Architecture-wise, IBM was working on early RISC concepts leading to the 701, but most of the industry was on a full-steam trajectory to peak CISC (VAX) in the late 1970s. It’s fair to say that “CISC” wasn’t even a thing yet; instruction sets were just complex. P&G’s paper considered three contemporary computers: IBM System/360 Honeywell 6000 DEC PDP-10 CISC atomic operations and synchronisation primitives These machines had composite/”read-modify-write” atomic operations, similar to those in today’s x86 architectures. System/360 had compare-and-swap, locked operations (read-operate-write), test-and-set, and PDP-10 had EXCHange/swap. These kinds of instructions are not sensitive so, unless the addressed memory is privileged, atomic operations can be performed inside virtual machines without the hypervisor needing to know. Atomic operations in RISC machines Many RISC machines support multi-instruction synchronisation sequences built up around two instruction primitives: Load-and-set-reservation Store-conditional MIPS called these load-linked (LL) and store-conditional (SC), and I’ll use these terms. ARMv8 has LDXR/STXR. PowerPC has LWARX/STWCX. RISC-V has LR/SC. Many machines (such as ARMv8-LSE) also add composite operations such as CAS or atomic addition but still retain the base LL/SC mechanism, and sizes/acquire/release variants are often provided. The concept is that the LL simultaneously loads a value and sets a “reservation” covering the address in question, and a subsequent SC succeeds only if the reservation is still present. A conflicting write to the location (e.g. a store on another CPU) clears the reservation and the SC returns a failure value without modifying memory; LL/SC are performed in a loop to retry until the update succeeds. An LL/SC sequence can typically be arbitrarily complex – a lock routine might test a location is cleared and store a non-zero value if so, whereas an update might increment a counter or calculate a “next” value, and so on. Typically an ISA does not restrict what lies between LL and SC. Coming back to virtualisation requirements, the definition of a reservation is interesting because it’s effectively “hidden state” that the hypervisor cannot manage. Typically, a hypervisor cannot easily read whether a reservation exists, and it can’t be saved/restored1. CISC-like RmW atomic operations do not exhibit this property. Problem seen, problem felt Shall I get to the point? I saw an odd but legal guest code sequence that can be difficult to virtualise. I’ve been trying to run MacOS 9.2 in KVM-PR on a PowerPC G4, and observed the NanoKernel acquire-lock routine happens to use a sensitive instruction (mfsprg) between a lwarx and stwcx. This is strange, and guarantees a trap to the host between the LL and SC operations. Though the guest should not be doing weird stuff when acquiring a lock, it’s still an architecturally-correct program. This means that if the reservation isn’t preserved across the trap, the lock is never taken. Forward progress is never achieved and virtualisation equivalence is not maintained (because the guest livelocks). Specifically, if the reservation is always cleared on the trap, we have a problem. If it is sometimes kept, the guest program can progress. Since the state is hidden (the hypervisor can’t save/restore/re-create), correctness depends on two things: The hypervisor’s exception-emulation-return path not itself clearing the reservation every time for any possible trap The ISA and hardware implementation guaranteeing the reservation is not always cleared by hardware This potential issue isn’t limited to PPC or the MacOS guest. Software guarantees The hypervisor must guarantee two things: It must not intentionally clear reservations on all traps. It must not accidentally do so as a side-effect of a chosen activity: For example, using its own synchronisation primitives elsewhere, or by writing memory that would conflict with the guest’s reservation. This can be challenging: context switching must be avoided in the T&E handler (no sleep or pre-emption), and it can’t take locks. In my MacOS guest experiment, KVM-PR does not happen to currently use any synchronisation primitives on its emulation path, ew delicate – but I had tracing on, which does. The guest locked up. Hardware guarantees But does your CPU guarantee that reservations aren’t always cleared?2 That seems to depend. This morning’s light reading gives: PowerPC architecture PowerISA is comparatively clear on the behaviour (which isn’t surprising, as PowerISA is generally very clearly-specified). PowerISA v3.1 section 1.7.2.1 describes reservations, listing specific reasons for reservation loss. Some are the expected “lose the reservation if someone else hits the memory” reasons, but previous PowerISAs (e.g. 2.06) permitted embedded implementations to clear the reservation on all exceptions. This permission was removed by 3.1; in my opinion a good move. (I did just this, for reasons, in my homebrew PowerPC CPU, oops!) PowerISA does permit spontaneous reservation loss due to speculative behaviour, but is careful to require that forward progress is guaranteed (i.e. that an implementation doesn’t happen to clear the reservation every time for a given piece of code). Finally, it includes a virtualisation-related programming note stating a reservation may be lost if software executes a privileged instruction or utilizes a privileged facility (i.e. sensitive instructions). This expresses intent, but isn’t specification: it doesn’t criminalise a guest doing wrong things unless it’s a rule that was there from the dawn of time. At any rate, this post is going to be old news to the PowerISA authors. Nice doc, 8/10, good jokes, would read again. RISC-V architecture The lack of any guest legacy permits the problem to be solved from the other direction. Interestingly, the RISC-V ISA explicitly constrains the instruction sequences between LR/SC: "The dynamic code executed between the LR and SC instructions can only contain instructions from the base “I” instruction set, excluding loads, stores, backward jumps, taken backward branches, JALR, FENCE, FENCE.I, and SYSTEM instructions.“ This is a good move. Tacitly, this bans sensitive instructions in the critical region, and permits an absence of progress if the guest breaks the rules. Ruling out memory accesses is interesting too, because it can be useful for a hypervisor to be able to T&E any given page in the guest address space without repercussions. Reservation granule size An LL operation is usually architecturally permitted to set an address-based reservation with a size larger than the original access, called the “reservation granule”. A larger granule reduces tracking requirements but increases the risk of a kind of false sharing between locks where an unrelated CPU taking an unrelated lock could clear your CPU’s reservation. This is important to our hypervisor, because of guarantee #2 above: when emulating a sensitive instruction it must not access anything that always causes the reservation to clear. You would hope the guest doesn’t soil itself by executing an instruction against its interests, so we can assume the guest won’t intentionally direct the hypervisor to hit on shared addresses, but if hypervisor and guest memory could ever coexist within a reservation granule there is scope for conflict. PowerPC defines the largest granule as, effectively, the (small) page size. ARM defines it as 4KB (effectively, the same). It’s a reasonable architectural assumption that guest and host memory is disjoint at page size granularity. RISC-V permits the reservation granule to be unlimited, which isn’t great3 – but later notes that “a platform specification may constrain the size and shape of the reservation set. For example, the Unix platform is expected to require of main memory that the reservation set be of fixed size, contiguous, naturally aligned, and no greater than the virtual memory page size.” Conclusion An ISA cannot be classically virtualised if it permits some aspect of trapping or emulation (such as the exception itself) to always cause a reservation to be cleared, unless sensitive instructions are prohibited from any region dependent on a reservation. In terms of computer science, it’s quite unsatisfying if it were possible to have a sequence of RISC instructions that cannot be classically virtualised due to hidden state. In practical terms, trap-and-emulate is alive and well in systems supporting nested virtualisation. Although some ISAs provide a level of hardware support for NV, it tends to be assists to speed up use of privilege compression rather than more exception levels and more translation stages (which, to be fair, would be awful). Consequently there is always something hypervisor-privileged being trapped to the real hypervisor, i.e. T&E is used in anger. So, there are some hardware behaviours which must (continue to be) guaranteed and, unfortunately, some constraints on already-complex software which must be observed. I thought this small computer architecture safari might be interesting to others, and hope you enjoyed the read! Footnotes In theory an ISA could provide the hypervisor with a previous reservation’s address, but re-creating it with a later LL raises ordering model questions! ↩ Sorry for the double-negative, but this alludes to the possibility of architecture permissions (for example, statements like “X is permitted to spontaneously happen at any time”) leading to implementations taking convenient liberties such as “always do X when any cache line is fetched”. If these decisions were to exist, they would be impossible to avoid stepping on, even with a carefully-written hypervisor. ↩ It would be terrible to permit an implementation to allow all hypervisor memory accesses to clear the reservation! ↩

over a year ago 14 votes
A small ode to the CRT

Built October 2018 I used to hate Cathode Ray Tubes. As a kid in Europe, everything flickered at 50Hz, or made a loud whistle at 15.625KHz (back when I could still hear it). CRTs just seemed crude, “electro-brutalist” contraptions from the valve era. They were heavy, and delicate, and distorted, and blurry, and whistled, and gave people electric shocks when they weren’t busy imploding and spreading glass shards around the place. When I saw the film Brazil, I remember getting anxious about exposed CRTs all over the place — seems I was the kind of kid who was more worried about someone touching the anode or electron gun than the totalitarian bureaucratic world they lived in. 🤷🏻‍♂️ As ever, I digress. Now in the 2020s, the CRT is pretty much gone. We have astonishing flat-panel LCD and OLED screens. Nothing flickers, everything’s pin-sharp, multi-megapixel resolutions, nothing whines (except me), and display life is pretty incredible for those of us old enough to remember green-screen computing (but young enough to still see the details). But, the march to betterness marches away from accessible: if you take apart a phone, the LCD is a magic glass rectangle, and that’s it. Maybe you can see some LEDs if you tear it apart, but it’s really not obvious how it works. CRTs are also magic, but in a pleasing 19th century top-hat-and-cane science kind of way. Invisible beams trace out images through a foot of empty space. They respond colourfully to magnets (also magic) held to their screens by curious children whose glee rapidly decays into panic, and trying to undo the effect using the other pole before their mother looks around and discovers what they’ve done (allegedly). The magnet-game is a clue: (most) CRTs use electromagnets that scans the invisible electron beam to light an image at the front. There’s something enjoyable about moving the beam yourself, with a magnet in hand, and you can kind of intuitively figure out how it works from doing this. (Remember the Left-hand Rule?) I started to warm to CRTs, maybe a fondness when I realised I hadn’t had to seriously use one for over a decade. I wanted to build something. I also like smol displays, and found an excellent source for a small CRT — a video camera viewfinder. Home cameras had tiny CRTs, roughly 1cm picture size, but I looked to find a higher-end professional viewfinder because they tended to have larger tubes for a higher-quality image. Eventually I found a Sony HVF-2000 viewfinder, from ca. 1980. This viewfinder contained a monochrome 1.5” CRT, and its drive circuitry on stinky 1970s phenolic resin PCBs. All it needs are two turntables and an 8V DC power supply and composite video input. It displays nice, sharp images on a cool white phosphor. I built this from it: Small CRT floating in a box I wanted to show the CRT from all angles, without hiding any of it, in the trusty “desktop curiosity” style. The idea was to show off this beautiful little obsolete glass thingy, in a way that you could sorta guess how it worked. Switching it on with a pleasing clack, it starts silently playing a selection of 1980s TV shows, over and over and over: I had this on my desk at work, and a Young PersonTM came into my office one day to ask about it. He hadn’t really seen a CRT close-up before, and we had a fun chat about how it worked (including waving a magnet at it – everyone has a spare magnet on their desk for these moments, don’t they? Hello…?). Yay! If you’re unfamiliar with CRTs, they work roughly like this: The glass envelope contains a vacuum. The neck contains a heating filament (like a lightbulb) which gives off electrons into the void. This “electron gun” is near some metal plates (with variously high positive and negative voltages), which act to focus the fizz of electrons into a narrow beam, directing it forward. The inside of the front face of the tube is covered by a phosphorescent material which lights up when hit with electrons. The front face is connected to the anode terminal, a high positive voltage. This attracts the beam of electrons, which accelerate to the front. The beam hits the front and creates light in a small spot. To create the picture, the beam is steered in rasters/lines using horizontal and vertical electromagnets wrapped around the neck of the tube. (The magnets are called the “yoke”.) For PAL at 50Hz, lines are drawn 15625 times a second. Relying on the principle of persistence of vision, this creates the illusion of a steady image. The tube is sealed and electron gun inside is largely invisible, but here you can see the malicious-looking thick anode wire, and how dainty the tube really is with the yoke removed: Note: the anode voltage for this tube is, from memory, about 2.5 kilovolts, so not particularly spicy. A large computer monitor will give you 25KV! Did I mention the X-rays? Circuit The original viewfinder was a two-board affair, fitting in a strange transverse shape for the viewfinder case. I removed a couple of controls and indicators unrelated to the CRT operation, and extended the wires slightly so they could be stacked. The viewfinder’s eyepiece looks onto a mirror, turning 90º to the CRT face — so the image is horizontally flipped. This was undone by swapping the horizontal deflection coil wires, reversing the field direction. The circuit’s pretty trivial. It just takes a DC input (9-12V) and uses two DC-DC converter modules to create an 8V supply for the CRT board and a 5V supply for a Raspberry Pi Zero layered at the bottom. The whole thing uses under 2W. The Pi’s composite output drops straight into the CRT board. The Pi starts up a simple shell script that picks a file to play. There’s a rotary encoder on the back, to change channel, but I haven’t wired it up yet. Case For me, the case was the best bit. I had just got (and since lost :((( ) access to a decent laser cutter, and wanted to make a dovetailed transparent case for the parts. It’s made from 3mm colourless and sky-blue acrylic. Rubber bands make the world go round The CRT is supported from two “hangers”, and two trays below hold the circuitry. These are fixed to the sides using a slot/tab approach, with captive nuts. In the close-up pictures you can see there are some hairline stress fractures around the corners of some of the tab cut-outs: they could evidently do with being a few hundred µm wider! The front/top/back/bottom faces are glued together, then the left/right sides are screwed into the shelves/hangers with captive M3 nuts. This sandwiches it all together. The back holds a barrel-style DC jack, power switch, and (as-yet unused) rotary encoder. The encoder was intended to eventually be a kind of “channel select”: The acrylic is a total magnet for fingerprints and dust, which is excellent if you’re into that kind of thing. There seems to also be little flecks filling the case, probably some aquadag flaking off the CRT. This technology just keeps on giving. OpenSCAD The case is designed in OpenSCAD, and is somewhat parameterised: the XYZ dimensions, dovetailing, spacing of shelves and so forth can be tweaked till it looks good. One nice OpenSCAD laser-cutting trick I saw is that 2D parts can be rendered into a “preview” 3D view, tweaked and fettled, and then re-rendered flat on a 2D plane to create a template for cutting. So, make a 3D prototype, change the parameters until it looks good (maybe printing stuff out to see whether the physical items actually fit!)… …then change the mode variable, and the same parts are laid out in 2D for cutting: Feel free to hack on and re-use this template. Resources OpenSCAD box sources Pics Tiny dmesg! Edmund Esq

over a year ago 16 votes
Mac SE/30 odyssey

I’ve always wanted an Apple Macintosh SE/30. Released in 1989, they look quite a lot like the other members of the original “compact Mac” series, but pack in a ton of interesting features that the other compact Macs don’t have. This is the story of my journey to getting to the point of owning a working Mac SE/30, which turns out not to be as simple as just buying one. Stay tuned for tales of debugging and its repair. So, the Mac. Check it out, with the all-in-one-style 9” monochrome display: The beautiful Macintosh SE/30 I mean, look at it, isn’t it lovely? :) The key technical difference between the SE/30 and the other compact Macs is that the SE/30 is much much less crap. It’s like a sleeper workstation, compared to the Mac Plus, SE, or Classic. 8MHz 68K? No! ~16MHz 68030. Emulating FP on a slow 68K? No! It ships with a real FPU! Limited to 4MB of RAM? Naw, this thing takes up to 128MB! Look, I wouldn’t normally condone use of CISC machines (and – unpopular opinion – I’m not actually a 68K fan :D ), but not only has this machine a bunch of capability RAM-wise and CPU-wise, but this machine has an MMU. In my book, MMUs make things interesting (as well as ‘interesting’). Unlike all the other compact Macs, this one can run real operating systems like BSD, and Linux. And, I needed to experience A/UX first-hand. Unpopular opinion #2: I don’t really like ye olde Mac OS/System 7 either! :) It was very cool at the time, and made long-lasting innovations, but lack of memory protection or preemptive scheduling made it a little delicate. At the time, as a kid, it was frustrating that there was no CLI, or any way to mess around and program them without expensive developer tools – so I gravitated to the Acorn Archimedes machines, and RISC OS (coincidentally with the same delicate OS drawbacks), which were much more accessible programming-wise. Anyway, one week during one of the 2020 lockdowns I was reminded of the SE/30, and got a bit obsessed with getting hold of one. I was thinking about them at 2am (when I wasn’t stressing about things like work), planning which OSes to try out, which upgrades to make, how to network it, etc. Took myself to that overpriced auction site, and bought one from a nearby seller. We got one! I picked it up. I was so excited. It was a good deal (hollow laugh from future-Matt), as it came in a shoulder bag and included mouse/keyboard, an external SCSI drive and various cables. Getting it into the car, I noticed an OMINOUS GRITTY SLIDING SOUND. Oh, did I mention that these machines are practically guaranteed to self-destruct because either the on-board electrolytic caps ooze out gross stuff, or the on-board Varta lithium battery poos its plentiful and corrosive contents over the logic board? [If you own one of these machines or, let’s face it, any machine from this era, go right now and remove the batteries if you haven’t already! Go on, it’s important. (I’m also looking at you, Acorn RISC PC owners.) I’ll wait.] I opened up the machine, and the first small clue appeared: Matt: Oh. That’s not a great omen. Matt, with strained optimism: “But maybe the logic board will be okay!” Mac SE/30: “Nah mate, proper fucked sry.” Matt: :( At this point I’d like to say that the seller was a volunteer selling donated items from a charity shop, and it was clear they didn’t really know much about the machine. It was disappointing, but the money paid for this one is just a charitable donation and I’m happy at that. (If it were a private seller taking money for a machine that sounded like it washed up on a beach, it’d be a different level of fury.) Undeterred (give it up, Matt, come on), I spent a weekend trying to resurrect it. Much of the gross stuff washed off, bathing it in a sequence of detergents/vinegar/IPA/etc: You can see some green discolouring of the silkscreen in the bottom right. Submerged in (distilled) water, you can see a number of tracks that vanish halfway, or have disappeared completely. Or, components whose pads and leads have been destroyed! The battery chemicals are very ingenious; they don’t just wash like lava across the board and destroy the top, but they also wick down into the vias and capillary action seems to draw them into the inner layers. Broken tracks, missing pads, missing components, missing vias Poring over schematics and beeping out connections, I started airwiring the broken tracks (absolutely determined to get this machine running, as though some perverse challenge). But, once I found broken tracks on the inner layers, it moved from perverse to Sisyphean because I couldn’t just see where the damage was: wouldn’t even finding the broken tracks by beeping out all connections be O(intractible)? Making the best decision so far in the odyssey, I gave up and searched for another SE/30. At least I got a spare keyboard and mouse out of it. But also, a spare enclosure/CRT/analog board, etc., which will be super-useful in a few paragraphs. Meet the new Mac, same as the old Mac I found someone selling one who happeend to be in the same city (and it turns out, we even worked for the same company – city like village). This one was advertised as having been ‘professionally re-capped’, and came loaded: 128MB of RAM, and a sought-after Ethernet card. Perfecto! Paranoid me immediately took it apart to check the re-capping and battery. :) Whilst there was a teeny bit of evidence of prior capacitor-leakage, it was really clean for a 31 year old machine and I was really pleased with it. The re-capping job looked sensible, check. The battery looked new, but I’m taking no chances this time and pulled it out. I had a good 2 hours merrily pissing about doing the kinds of things you do with a new old computer, setting up networking and getting some utilities copied over, such as a Telnet client: Telnet client, life is complete Disaster strikes After the two hour happiness timer expired, the machine stopped working. Here’s what it did: Otherwise, the Mac made the startup “bong” sound, so the logic board was alive, just unhappy video. I think we’re thinking the same thing: the CRT’s Y-deflection circuit is obviously broken. This family of Macs have a common fault where solder joints on the Analogue board crack, or the drive transistor fails. The excellent “Dead Mac Scrolls” book covers common faults, and fixes. But, remember the first Mac: the logic board was a gonner, but the Analog board/CRT seemed good. I could just swap the logic board over, and I’ve got a working Mac again and can watch the end of Telnet Star Wars. It did exactly the same thing! Bollocks, the problem was on the logic board. Debugging the problem We were both wrong: it wasn’t the Y-deflection circuit for the CRT. The symptoms of that would be that the CRT scans, but all lines get compressed and overdrawn along the centre – no deflection creating one super-bright line in the centre. Debug clues Clue 1: This line wasn’t super-bright. Let’s take a closer look: Clue 2: It’s a dotted line, as though it’s one line of the stippled background when the Mac boots. That’s interesting because it’s clearly not being overdrawn; multiple lines merged together would overlay even/odd odd/even pixels and come out solid white. The line also doesn’t provide any of the “happy Mac” icon in the middle, so it isn’t one of the centre lines of the framebuffer. SE/30 logic board on The Bench, provided with +5V/+12V and probed with scope/LA If you’ve an SE/30 (or a Classic/Plus/128/512 etc.) logic board on a workbench, they’re easy enough to power up without the Analog board/CRT but be aware the /RESET circuitry is a little funky. Reset is generated by the sound chip (…obviously) which requires both +5V and +12V to come out of reset, so you’ll need a dual-rail bench supply. I’d also recommend plugging headphones in, so you can hear the boot chime (or lack of) as you tinker. Note the audio amp technically requires -5V too, but with +5V alone you should still be able to hear something. This generation of machines are one of the last to have significant subsystems still being implemented as multi-chip sections. It’s quite instructive to follow along in the schematic: the SE/30 video system is a cluster of discrete X and Y pixel counters which generate addresses into VRAM (which spits out pixels). Some PALs generate VRAM addresses/strobes/refresh, and video syncs. Clue 3: The video output pin on the chonky connector is being driven, and HSYNC is running correctly (we can deduce this already, though, because the CRT lights up meaning its HT supply is running, and that’s driven from HSYNC). But, there was no VSYNC signal at all. VSYNC comes from a PAL taking a Y-count from a counter clocked by 'TWOLINE' Working backwards, I traced VSYNC from the connector to PAL UG6. It wasn’t simply a broken trace, UG6 wasn’t generating it. UG6 appears to be a comparator that generates vertical timing strobes when the Y line count VADR[7:0] reaches certain lines. The Y line count is generated from a dual hex counter, UF8. Clue 4: The Y line count wasn’t incrementing at all. That explains the lack of VSYNC, as UG6 never saw the “VSYNC starts now” line come past. The UF8 counter is clocked/incremented by the TWOLINE signal output from PAL UG7. Clue 5a: PAL UG7’s TWOLINE output was stuck/not transitioning. Its other outputs (such as HSYNC) were transitioning fine. PALs do die, but it seems unusual for only a single output to conk out. Clue 5b: PAL UG7 was unusually hot! Clue 6, and the root problem: Pulling the PALs out, the TWOLINE pin measures 3Ω to ground. AHA! Debug epiphany *Something is shorting the TWOLINE signal to a power rail. Here’s how the clues correspond to the observations: There is no VSYNC; the Y line count is stuck at 0. The X counter is working fine. (HSYNC is produced, and a stippled pattern line is displayed correctly.) The display shows the top line of the video buffer (from address 0, over and over) but never advances onto the next line. The CRT Y deflection is never “charged up” by a VSYNC so the raster stays in the centre on one line, instead of showing 384 identical lines. We can work with this. TWOLINE is shorted somehow. Tracing it across the PCB, every part of the trace looked fine, except I couldn’t see the part that ran underneath C7 (one of the originally-electrolytic caps replaced with a tantalum). I removed C7: See the problem? It’s pleasingly subtle… How about now? A tiny amount of soldermask has come off the track just south of the silkscreen ‘+’. This was Very Close to the capacitor’s contact, and was shorting against it! Above I thought it was shorting to ground: it’s shorting to +5V (which, when you measure it might be a low number of ohms to ground). My theory is that it wasn’t completely contacting, or wasn’t making a good connection, and that the heat from my 2-hour joyride expanded the material such that it made good contact. You can see that there’s some tarnish on the IC above C7 – this is damage from the previous C7 leaking. This, or the re-capping job, lifted the insulating soldermask leading to the short. Fixed The fix was simple, add some insulation using kapton tape and replace the capacitor: After that, I could see VSYNC being produced! But would it work? The sweet 1bpp stippled smell of success Yasssssss! :) Time to put it all back together, trying not to touch or break the CRT. And now for something completely different, but eerily familiar I mentioned I wanted this particular model because it could run “interesting OSes”. Did you know that, way before NeXT and OS X, Apple was a UNIX vendor? Apple A/UX operating system I’ve always wanted to play with Apple’s A/UX. By version 3.1, it had a very highly-integrated Mac OS ‘Classic’ GUI running on a real UNIX. It's like Mac OS, but... there's a UNIX dmesg too? It’s not X11 (though an X server is available), it really is running the Mac Toolbox etc., and it seems to have some similarities with the later OS X Blue Box/Classic environment in that it runs portions of Mac OS as a UNIX process. In the same way as OS X + Blue Box, A/UX will run unmodified Mac OS applications. The Finder is integrated with the UNIX filesystems in both directions (i.e. from a shell you can manipulate Mac files). These screenshots don’t do it justice, but there are good A/UX screenshots elsewhere. As an OS geek, I’m really impressed with the level of integration between the two OSes! It’s very thorough. Since the usual UNIX development tools are available, there’s a bit of cognitive dissonance of being able to “program a Mac” right out of the box: A/UX example application I mean, not just building normal UNIX command-line apps with cc/make etc., but the development examples include Mac OS GUI apps as well! It’s truly living in the future™. Plug for RASCSI Playing with ancient machines and multiple OSes is pretty painful when using ancient SCSI discs because: Old discs don’t work Old discs are small Transferring stuff to and from discs means plugging it into your Linux box and… I don’t have SCSI there Old discs don’t work and will pretend to and then screw up and ruin your week I built a RASCSI adapter (write-up and PCB posting TBD), software and circuit originally by GIMONS. This is a Raspberry Pi adapter that allows a userspace program to bit-bang the SCSI-I protocol, serving emulated disc/CD-ROM images from SD card. It works beautifully on the SE/30, and lets it both have several discs present at once, and switch between images quickly. Homemade RASCSI clone, SCSI emulator for Raspberry Pi The end, seeeeeeya! Resources https://archive.org/details/mac_The_Dead_Mac_Scrolls_1992 https://winworldpc.com/product/a-ux/3x https://68kmla.org/bb/index.php

over a year ago 23 votes
32-bit hat, with LEDs

Built in November 2015 (now-traditional multi-year writeup delay applied) A hat, bejewelled with 38 RGB LEDs Is this thing on..? It’s been a while since I’ve written one of these. So, the hat. It’s been on the writeup pile for almost 6 years, nagging away. Finally it’s its time to shine! NO PUN ESCAPES Anyway, the hat. It seemed like a good idea, and I even wore it out dancing. I know, so cool. This hat had been through at least two fancy-dress events, and had a natty band aftermarket mod even before the LEDs. Long story short, got a hat, put a battery, ARM Cortex-M0 microcontroller, accelerometer in it and a strip of full-colour RGB LEDs around it. The LEDs then react to movement, with an effect similar to a spirit level: as it tilts, a spark travels to the highest point. The spark rolls around, fading out nicely. Hardware Pretty much full bodge-city, and made in a real rush before a party. Parts: Charity shop Trilby (someone’s going to correct me that this is not an ISO standard Trilby and is in fact a Westcountry Colonel Chap Trilby, or something). Bugger it – a hat. A WS2812B strip of 38 LEDs. 38 is what would fit around the hat. Cheapo ADXL345 board. Cheapo STM32F030 board (I <3 these boards! So power, such price wow). Cheapo Li-Ion charging board and 5V step-up module all-in-one (AKA “powerbank board”). Li-Ion flat/pouch-style battery. Obviously some hot glue in there somewhere too. No schematic, sorry, it was quite freeform. The battery is attached to charging board. That connects to the rest of the system via a 0.1” header/disconnectable “power switch” cable. The 5V power then directly feeds the LED strip, from Cortex-M0 board (which then generates 3.3V itself). The ADXL345 accelerometer is joined directly to the the STM32 board at what was the UART header, which is configured for I2C: The STM32 board is also stripped of any unnecessary or especially pointy parts, such as jumpers/pin headers, to make it as flat and pain-free as possible. The LED strip is bent into a ring and soldered back onto itself. 5V and ground are linked at the join, whereas DI enters at the join and DO is left hanging. This is done for mechanical stability, and can’t hurt for power distribution too. Here’s the ring in testing: The electronics are mounted in an antistatic bag (with a hole for the power “switch” header pins, wires, etc.), and the bag sewn into the top of the hat: The LED ring is attached via a small hole, and sewn on with periodic thread loops: Software The firmware goes through an initial “which way is up?” calibration phase for the first few seconds, where it: Lights a simple red dotted pattern to warn the user it’s about to sample which way is up, so put it on quick and stand as naturally as you can with such exciting technology on your head, Lights a simple white dotted pattern, as it measures the “resting vector”, i.e. which way is up. This “resting vector” is thereafter used as the reference for determining whether the hat is tilted, and in which direction. Tilt direction vectors The main loop’s job is to regulate the rate of LED updates, read the accelerometer, calculate a position to draw a bright spark “blob”, and update the LEDs. The accelerometer returns a 3D vector of a force; when not being externally accelerated, the vector represents the direction of Earth’s gravity, i.e. ‘down’. Trigonometry is both fun and useful Roughly, the calculations that are performed are: Relative to “vertical” (approximated by the resting vector), calculate the hat’s tilt in terms of angle of the measured vector to vertical, and its bearing to “12 o’clock” in the horizontal (XY) plane. Convert the bearing of the vector into a position in the LED hoop. Use the radius of the vector in the XY plane as a crude magnitude, scaling up the spark intensity for a larger tilt. All this talk of tilt and gravity vectors assumes the hat isn’t being moved (i.e. worn by a human). It doesn’t correct for the fact that the hat is likely actually accelerating, rather than sitting static at a tilt but, hey, this is a hat with LEDs and not a rocket. It is incorrect and looks good. Floating-point I never use floating point in any of my embedded projects. I’m a die-hard fixed-point kind of guy. You know where you are with fixed point. Sooo anyway, the firmware uses the excellent Qfplib, from https://www.quinapalus.com/qfplib-m0-tiny.html. This provides tiny single-precision floating point routines, including the trigonometric routines I needed for the angle calculations. Bizarrely, with an embedded hat on, it was way easier using gosh-darnit real FP than it was to do the trigonometry in fixed point. Framebuffer The framebuffer is only one dimensional :) It’s a line of pixels representing the LEDs. Blobs are drawn into the framebuffer at given position, and start off “bright”. Every frame, the brightness of all pixels is decremented, giving a fade-out effect. The code drawing blobs uses a pre-calculated colour look-up table, to give a cool white-blue-purple transition to the spark. Driving the WS2812B RGB LEDs The WS2812B LEDs take a 1-bit stream of data encoding 24b of RGB data, in a fixed-time frame using relative timing of rising/falling edges to give a 0 or 1 bit. The code uses a timer in PWM mode to output a 1/0 data bit, refilled from a neat little DMA routine. Once a framebuffer has been drawn, the LEDs are refreshed. For each pixel in the line, the brightness bits are converted into an array of timer values each representing a PWM period (therefore a 0-time or a 1-time). A double-buffered DMA scheme is used to stream these values into the timer PWM register. This costs a few bytes of memory for the intermediate buffers, and is complicated, but has several advantages: It’s completely flicker-free and largely immune to any other interrupt/DMA activity compared to bitbanging approaches. It goes on in the background, freeing up CPU time to calculate the next frame. Though the CPU is pretty fast, this allows LEDHat to update at over 100Hz, giving incredibly fluid motion. Resources Firmware sourcecode: https://github.com/evansm7/LEDHat

over a year ago 18 votes

More in technology

Greatest Hits

I’ve been blogging now for approximately 8,465 days since my first post on Movable Type. My colleague Dan Luu helped me compile some of the “greatest hits” from the archives of ma.tt, perhaps some posts will stir some memories for you as well: Where Did WordCamps Come From? (2023) A look back at how Foo … Continue reading Greatest Hits →

21 hours ago 2 votes
Let's give PRO/VENIX a barely adequate, pre-C89 TCP/IP stack (featuring Slirp-CK)

TCP/IP Illustrated (what would now be called the first edition prior to the 2011 update) for a hundred-odd bucks on sale which has now sat on my bookshelf, encased in its original shrinkwrap, for at least twenty years. It would be fun to put up the 4.4BSD data structures poster it came with but that would require opening it. Fortunately, today we have AI we have many more excellent and comprehensive documents on the subject, and more importantly, we've recently brought back up an oddball platform that doesn't have networking either: our DEC Professional 380 running the System V-based PRO/VENIX V2.0, which you met a couple articles back. The DEC Professionals are a notoriously incompatible member of the PDP-11 family and, short of DECnet (DECNA) support in its unique Professional Operating System, there's officially no other way you can get one on a network — let alone the modern Internet. Are we going to let that stop us? Crypto Ancienne proxy for TLS 1.3. And, as we'll discuss, if you can get this thing on the network, you can get almost anything on the network! Easily portable and painfully verbose source code is included. Recall from our lengthy history of DEC's early misadventures with personal computers that, in Digital's ill-advised plan to avoid the DEC Pros cannibalizing low-end sales from their categorical PDP-11 minicomputers, Digital's Small Systems Group deliberately made the DEC Professional series nearly totally incompatible despite the fact they used the same CPUs. In their initial roll-out strategy in 1982, the Pros (as well as their sibling systems, the Rainbow and the DECmate II) were only supposed to be mere desktop office computers — the fact the Pros were PDP-11s internally was mostly treated as an implementation detail. The idea backfired spectacularly against the IBM PC when the Pros and their promised office software failed to arrive on time and in 1984 DEC retooled around a new concept of explicitly selling the Pros as desktop PDP-11s. This required porting operating systems that PDP-11 minis typically ran: RSX-11M Plus was already there as the low-level layer of the Professional Operating System (P/OS), and DEC internally ported RT-11 (as PRO/RT-11) and COS. PDP-11s were also famous for running Unix and so DEC needed a Unix for the Pro as well, though eventually only one official option was ever available: a port of VenturCom's Venix based on V7 Unix and later System V Release 2.0 called PRO/VENIX. After the last article, I had the distinct pleasure of being contacted by Paul Kleppner, the company's first paid employee in 1981, who was part of the group at VenturCom that did the Pro port and stayed at the company until 1988. Venix was originally developed from V6 Unix on the PDP-11/23 incorporating Myron Zimmerman's real-time extensions to the kernel (such as semaphores and asynchronous I/O), then a postdoc in physics at MIT; Kleppner's father was the professor of the lab Zimmerman worked in. Zimmerman founded VenturCom in 1981 to capitalize on the emerging Unix market, becoming one of the earliest commercial Unix licensees. Venix-11 was subsequently based on the later V7 Unix, as was Venix/86, which was the first Unix on the IBM PC in January 1983 and was ported to the DEC Rainbow as Venix/86R. In addition to its real-time extensions and enhanced segmentation capability, critical for memory management in smaller 16-bit address spaces, it also included a full desktop graphics package. Notably, DEC themselves were also a Unix licensee through their Unix Engineering Group and already had an enhanced V7 Unix of their own running on the PDP-11, branded initially as V7M. Subsequently the UEG developed a port of 4.2BSD with some System V components for the VAX and planned to release it as Ultrix-32, simultaneously retconning V7M as Ultrix-11 even though it had little in common with the VAX release. Paul recalls that DEC did attempt a port of Ultrix-11 to the Pro 350 themselves but ran into intractable performance problems. By then the clock was ticking on the Pro relaunch and the issues with Ultrix-11 likely prompted DEC to look for alternatives. Crucially, Zimmerman had managed to upgrade Venix-11's kernel while still keeping it small, a vital aspect on his 11/23 which lacked split instruction and data addressing and would have had to page in and out a larger kernel otherwise. Moreover, the 11/23 used an F-11 CPU — the same CPU as the original Professional 350 and 325. DEC quickly commissioned VenturCom to port their own system over to the Pro, which Paul says was a real win for VenturCom, and the first release came out in July 1984 complete with its real-time features intact and graphics support for the Pro's bitmapped screen. It was upgraded ("PRO/VENIX Rev 2.0") in October 1984, adding support for the new top-of-the-line DEC Professional 380, and then switched to System V (SVR2) in July 1985 with PRO/VENIX V2.0. (For its part Ultrix-11 was released as such in 1984 as well, but never for the Pro series.) Keep that kernel version history in mind for when we get to oddiments of the C compiler. As for networking, though, with the exception of UUCP over serial, none of these early versions of Venix on either the PDP-11 or 8086 supported any kind of network connectivity out of the box — officially the only Pro operating system to support its Ethernet upgrade option was P/OS 2.0. Although all Pros have a 15-pin AUI network port, it isn't activated until an Ethernet CTI card is installed. (While Stan P. found mention of a third-party networking product called Fusion by Network Research Corporation which could run on PRO/VENIX, Paul's recollection is that this package ran into technical problems with kernel size during development. No examples of the PRO/VENIX version have so far been located and it may never have actually been released. You'll hear about it if a copy is found. The unofficial Pro 2.9BSD port also supports the network card, but that was always an under-the-table thing.) Since we run Venix on our Pro, that means currently our only realistic option to get this on the 'Nets is also over a serial port. lower speed port for our serial IP implementation. PRO/VENIX supports using only the RS-423 port as a remote terminal, and because it's twice as fast, it's more convenient for logins and file exchange over Kermit (which also has no TCP/IP overhead). Using the printer port also provides us with a nice challenge: if our stack works acceptably well at 4800bps, it should do even better at higher speeds if we port it elsewhere. On the Pro, we connect to our upstream host using a BCC05 cable (in the middle of this photograph), which terminates in a regular 25-pin RS-232 on the other end. Now for the software part. There are other small TCP/IP stacks, notably things like Adam Dunkel's lwIP and so on. But even SVR2 Venix is by present standards a old Unix with a much less extensive libc and more primitive C compiler — in a short while you'll see just how primitive — and relatively modern code like lwIP's would require a lot of porting. Ideally we'd like a very minimal, indeed barely adequate, stack that can do simple tasks and can be expressed in a fashion acceptable to a now antiquated compiler. Once we've written it, it would be nice if it were also easily portable to other very limited systems, even by directly translating it to assembly language if necessary. What we want this barebones stack to accomplish will inform its design: and the hardware 24-7 to make such a use case meaningful. The Ethernet option was reportedly competent at server tasks, but Ethernet has more bandwidth, and that card also has additional on-board hardware. Let's face the cold reality: as a server, we'd find interacting with it over the serial port unsatisfactory at best and we'd use up a lot of power and MTBF keeping it on more than we'd like to. Therefore, we really should optimize for the client case, which means we also only need to run the client when we're performing a network task. no remote login capacity, like, I dunno, a C64, the person on the console gets it all. Therefore, we really should optimize for the single user case, which means we can simplify our code substantially by merely dealing with sockets sequentially, one at a time, without having to worry about routing packets we get on the serial port to other tasks or multiplexing them. Doing so would require extra work for dual-socket protocols like FTP, but we're already going to use directly-attached Kermit for that, and if we really want file transfer over TCP/IP there are other choices. (On a larger antique system with multiple serial ports, we could consider a setup where each user uses a separate outgoing serial port as their own link, which would also work under this scheme.) Some of you may find this conflicts hard with your notion of what a "stack" should provide, but I also argue that the breadth of a full-service driver would be wasted on a limited configuration like this and be unnecessarily more complex to write and test. Worse, in many cases, is better, and I assert this particular case is one of them. Keeping the above in mind, what are appropriate client tasks for a microcomputer from 1984, now over 40 years old — even a fairly powerful one by the standards of the time — to do over a slow TCP/IP link? Crypto Ancienne's carl can serve as an HTTP-to-HTTPS proxy to handle the TLS part, if necessary.) We could use protocols like these to download and/or view files from systems that aren't directly connected, or to send and receive status information. One task that is also likely common is an interactive terminal connection (e.g., Telnet, rlogin) to another host. However, as a client this particular deployment is still likely to hit the same sorts of latency problems for the same reasons we would experience connecting to it as a server. These other tasks here are not highly sensitive to latency, require only a single "connection" and no multiplexing, and are simple protocols which are easy to implement. Let's call this feature set our minimum viable product. Because we're writing only for a couple of specific use cases, and to make them even more explicit and easy to translate, we're going to take the unusual approach of having each of these clients handle their own raw packets in a bytewise manner. For the actual serial link we're going to go even more barebones and use old-school RFC 1055 SLIP instead of PPP (uncompressed, too, not even Van Jacobson CSLIP). This is trivial to debug and straightforward to write, and if we do so in a relatively encapsulated fashion, we could consider swapping in CSLIP or PPP later on. A couple of utility functions will do the IP checksum algorithm and reading and writing the serial port, and DNS and some aspects of TCP also get their own utility subroutines, but otherwise all of the programs we will create will read and write their own network datagrams, using the SLIP code to send and receive over the wire. The C we will write will also be intentionally very constrained, using bytewise operations assuming nothing about endianness and using as little of the C standard library as possible. For types, you only need some sort of 32-bit long, which need not be native, an int of at least 16 bits, and a char type — which can be signed, and in fact has to be to run on earlier Venices (read on). You can run the entirety of the code with just malloc/free, read/write/open/close, strlen/strcat, sleep, rand/srand and time for the srand seed (and fprintf for printing debugging information, if desired). On a system with little or no operating system support, almost all of these primitive library functions are easy to write or simulate, and we won't even assume we're capable of non-blocking reads despite the fact Venix can do so. After all, from that which little is demanded, even less is expected. slattach which effectively makes a serial port directly into a network interface. Such an arrangement would be the most flexible approach from the user's perspective because you necessarily have a fixed, bindable external address, but obviously such a scheme didn't scale over time. With the proliferation of dialup Unix shell accounts in the late 1980s and early 1990s, closed-source tools like 1993's The Internet Adapter ("TIA") could provide the SLIP and later PPP link just by running them from a shell prompt. Because they synthesize artificial local IP addresses, sort of NAT before the concept explicitly existed, the architecture of such tools prevented directly creating listening sockets — though for some situations this could be considered a more of a feature than a bug. Any needed external ports could be proxied by the software anyway and later network clients tended not to require it, so for most tasks it was more than sufficient. Closed-source and proprietary SLIP/PPP-over-shell solutions like TIA were eventually displaced by open source alternatives, most notably SLiRP. SLiRP (hereafter Slirp so I don't gouge my eyes out) emerged in 1995 and used a similar architecture to TIA, handing out virtual addresses on an synthetic network and bridging that network to the Internet through the host system. It rapidly became the SLIP/PPP shell solution of choice, leading to its outright ban by some shell ISPs who claimed it violated their terms of service. As direct SLIP/PPP dialup became more common than shell accounts, during which time yours truly upgraded to a 56K Mac modem I still have around here somewhere, Slirp eventually became most useful for connecting small devices via their serial ports (PDAs and mobile phones especially, but really anything — subsets of Slirp are still used in emulators today like QEMU for a similar purpose) to a LAN. By a shocking and completely contrived coincidence, that's exactly what we'll be doing! Slirp has not been officially maintained since 2006. There is no package in Fedora, which is my usual desktop Linux, and the one in Debian reportedly has issues. A stack of patch sets circulated thereafter, but the planned 1.1 release never happened and other crippling bugs remain, some of which were addressed in other patches that don't seem to have made it into any release, source or otherwise. If you tried to build Slirp from source on a modern system and it just immediately exits, you got bit. I have incorporated those patches and a couple of my own to port naming and the configure script, plus some additional fixes, into an unofficial "Slirp-CK" which is on Github. It builds the same way as prior versions and is tested on Fedora Linux. I'm working on getting it functional on current macOS also. Next, I wrote up our four basic functional clients: ping, DNS lookup, NTP client (it doesn't set the clock, just shows you the stratum, refid and time which you can use for your own purposes), and TCP client. The TCP client accepts strings up to a defined maximum length, opens the connection, sends those strings (optionally separated by CRLF), and then reads the reply until the connection closes. This all seemed to work great on the Linux box, which you yourself can play with as a toy stack (directions at the end). Unfortunately, I then pushed it over to the Pro with Kermit and the compiler immediately started complaining. SLIP is a very thin layer on IP packets. There are exactly four metabytes, which I created preprocessor defines for: A SLIP packet ends with SLIP_END, or hex $c0. Where this must occur within a packet, it is replaced by a two byte sequence for unambiguity, SLIP_ESC SLIP_ESC_END, or hex $db $dc, and where the escape byte must occur within a packet, it gets a different two byte sequence, SLIP_ESC SLIP_ESC_ESC, or hex $db $dd. Although I initially set out to use defines and symbols everywhere instead of naked bytes, and wrote slip.c on that basis, I eventually settled on raw bytes afterwards using copious comments so it was clear what was intended to be sent. That probably saved me a lot of work renaming everything, because: Dimly I recalled that early C compilers, including System V, limit their identifiers to eight characters (the so-called "Ritchie limit"). At this point I probably should have simply removed them entirely for consistency with their absence elsewhere, but I went ahead and trimmed them down to more opaque, pithy identifiers. That wasn't the only problem, though. I originally had two functions in slip.c, slip_start and slip_stop, and it didn't like that either despite each appearing to have a unique eight-character prefix: That's because their symbols in the object file are actually prepended with various metacharacters like _ and ~, so effectively you only get seven characters in function identifiers, an issue this error message fails to explain clearly. The next problem: there's no unsigned char, at least not in PRO/VENIX Rev. 2.0 which I want to support because it's more common, and presumably the original versions of PRO/VENIX and Venix-11. (This type does exist in PRO/VENIX V2.0, but that's because it's System V and has a later C compiler.) In fact, the unsigned keyword didn't exist at all in the earliest C compilers, and even when it did, it couldn't be applied to every basic type. Although unsigned char was introduced in V7 Unix and is documented as legal in the PRO/VENIX manual, and it does exist in Venix/86 2.1 which is also a V7 Unix derivative, the PDP-11 and 8086 C compilers have different lineages and Venix's V7 PDP-11 compiler definitely doesn't support it: I suspect this may not have been intended because unsigned int works (unsigned long would be pointless on this architecture, and indeed correctly generates Misplaced 'long' on both versions of PRO/VENIX). Regardless of why, however, the plain char type on the PDP-11 is signed, and for compatibility reasons here we'll have no choice but to use it. Recall that when C89 was being codified, plain char was left as an ambiguous type since some platforms (notably PDP-11 and VAX) made it signed by default and others made it unsigned, and C89 was more about codifying existing practice than establishing new ones. That's why you see this on a modern 64-bit platform, e.g., my POWER9 workstation, where plain char is unsigned: If we change the original type explicitly to signed char on our POWER9 Linux machine, that's different: and, accounting for different sizes of int, seems similar on PRO/VENIX V2.0 (again, which is System V): but the exact same program on PRO/VENIX Rev. 2.0 behaves a bit differently: The differences in int size we expect, but there's other kinds of weird stuff going on here. The PRO/VENIX manual lists all the various permutations about type conversions and what gets turned into what where, but since the manual is already wrong about unsigned char I don't think we can trust the documentation for this part either. Our best bet is to move values into int and mask off any propagated sign bits before doing comparisons or math, which is agonizing, but reliable. That means throwing around a lot of seemingly superfluous & 0xff to make sure we don't get negative numbers where we don't want them. Once I got it built, however, there were lots of bugs. Many were because it turns out the compiler isn't too good with 32-bit long, which is not a native type on the 16-bit PDP-11. This (part of the NTP client) worked on my regular Linux desktop, but didn't work in Venix: The first problem is that the intermediate shifts are too large and overshoot, even though they should be in range for a long. Consider this example: On the POWER9, accounting for the different semantics of %lx, But on Venix, the second shift blows out the value. We can get an idea of why from the generated assembly in the adb debugger (here from PRO/VENIX V2.0, since I could cut and paste from the Kermit session): (Parenthetical notes: csav is a small subroutine that pushes volatiles r2 through r4 on the stack and turns r5 into the frame pointer; the corresponding cret unwinds this. The initial branch in this main is used to reserve additional stack space, but is often practically a no-op.) The first shift is here at ~main+024. Remember the values are octal, so 010 == 8. r0 is 16 bits wide — no 32-bit registers — so an eight-bit shift is fine. When we get to the second shift, however, it's the same instruction on just one register (030 == 24) and the overflow is never checked. In fact, the compiler never shifts the second part of the long at all. The result is thus zero. The second problem in this example is that the compiler never treats the constant as a long even though statically there's no way it can fit in a 16-bit int. To get around those two gotchas on both Venices here, I rewrote it this way: An alternative to a second variable is to explicitly mark the epoch constant itself as long, e.g., by casting it, which also works. Here's another example for your entertainment. At least some sort of pseudo-random number generator is crucial, especially for TCP when selecting the pseudo-source port and initial sequence numbers, or otherwise Slirp seemed to get very confused because we would "reuse" things a lot. Unfortunately, the obvious typical idiom to seed it like srand(time(NULL)) doesn't work: srand() expects a 16-bit int but time(NULL) returns a 32-bit long, and it turns out the compiler only passes the 16 most significant bits of the time — i.e., the ones least likely to change — to srand(). Here's the disassembly as proof (contents trimmed for display here; since this is a static binary, we can see everything we're calling): At the time we call the glue code for time from main, the value under the stack pointer (i.e., r6) is cleared immediately beforehand since we're passing NULL (at ~main+06). We then invoke the system call, which per the Venix manual for time(2) uses two registers for the 32-bit result, namely r0 (high bits) and r1 (low bits). We passed a null pointer, so the values remain in those registers and aren't written anywhere (branch at _time+014). When we return to ~main+014, however, we only put r0 on the stack for srand (remember that r5 is being used as the frame pointer; see the disassembly I provided for csav) and r1 is completely ignored. Why would this happen? It's because time(2) isn't declared anywhere in /usr/include or /usr/include/sys (the two C include directories), nor for that matter rand(3) or srand(3). This is true of both Rev. 2.0 and V2.0. Since the symbols are statically present in the standard library, linking will still work, but since the compiler doesn't know what it's supposed to be working with, it assumes int and fails to handle both halves of the long. One option is to manually declare everything ourselves. However, from the assembly at _time+016 we do know that if we pass a pointer, the entire long value will get placed there. That means we can also do this: Now this gets the lower bits and there is sufficient entropy for our purpose (though obviously not a cryptographically-secure PRNG). Interestingly, the Venix manual recommends using the time as the seed, but doesn't include any sample code. At any rate this was enough to make the pieces work for IP, ICMP and UDP, but TCP would bug out after just a handful of packets. As it happens, Venix has rather small serial buffers by modern standards: tty(7), based on the TIOCQCNT ioctl(2), appears to have just a 256-byte read buffer (sg_ispeed is only char-sized). If we don't make adjustments for this, we'll start losing framing when the buffer gets overrun, as in this extract from a test build with debugging dumps on and a maximum segment size/window of 512 bytes. Here, the bytes marked by dashes are the remote end and the bytes separated by dots are what the SLIP driver is scanning for framing and/or throwing away; you'll note there is obvious ASCII data in them. If we make the TCP MSS and window on our client side 256 bytes, there is still retransmission, but the connection is more reliable since overrun occurs less often and seems to work better than a hard cap on the maximum transmission unit (e.g., "mtu 256") from SLiRP's side. Our only consequence to dropping the TCP MSS and window size is that the TCP client is currently hard-coded to just send one packet at the beginning (this aligns with how you'd do finger, HTTP/1.x, gopher, etc.), and that datagram uses the same size which necessarily limits how much can be sent. If I did the extra work to split this over several datagrams, it obviously wouldn't be a problem anymore, but I'm lazy and worse is better! The connection can be made somewhat more reliable still by improving the SLIP driver's notion of framing. RFC 1055 only specifies that the SLIP end byte (i.e., $c0) occur at the end of a SLIP datagram, though it also notes that it was proposed very early on that it could also start datagrams — i.e., if two occur back to back, then it just looks like a zero length or otherwise obviously invalid entity which can be trivially discarded. However, since there's no guarantee or requirement that the remote link will do this, we can't assume it either. We also can't just look for a $45 byte (i.e., IPv4 and a 20 byte length) because that's an ASCII character and appears frequently in text payloads. However, $45 followed by a valid DSCP/ECN byte is much less frequent, and most of the time this byte will be either $00, $08 or $10; we don't currently support ECN (maybe we should) and we wouldn't find other DSCP values meaningful anyway. The SLIP driver uses these sequences to find the start of a datagram and $c0 to end it. While that doesn't solve the overflow issue, it means the SLIP driver will be less likely to go out of framing when the buffer does overrun and thus can better recover when the remote side retransmits. And, well, that's it. There are still glitches to bang out but it's good enough to grab Hacker News: src/ directory, run configure and then run make (parallel make is fine, I use -j24 on my POWER9). Connect your two serial ports together with a null modem, which I assume will be /dev/ttyUSB0 and /dev/ttyUSB1. Start Slirp-CK with a command line like ./slirp -b 4800 "tty /dev/ttyUSB1" but adjusting the baud and path to your serial port. Take note of the specified virtual and nameserver addresses: Unlike the given directions, you can just kill it with Control-C when you're done; the five zeroes are only if you're running your connection over standard output such as direct shell dial-in (this is a retrocomputing blog so some of you might). To see the debug version in action, next go to the BASS directory and just do a make. You'll get a billion warnings but it should still work with current gcc and clang because I specifically request -std=c89. If you use a different path for your serial port (i.e., not /dev/ttyUSB0), edit slip.c before you compile. You don't do anything like ifconfig with these tools; you always provide the tools the client IP address they'll use (or create an alias or script to do so). Try this initial example, with slirp already running: Because I'm super-lazy, you separate the components of the IPv4 address with spaces, not dots. In Slirp-land, 10.0.2.2 is always the host you are connected to. You can see the ICMP packet being sent, the bytes being scanned by the SLIP driver for framing (the ones with dots), and then the reply (with dashes). These datagram dumps have already been pre-processed for SLIP metabytes. Unfortunately, you may not be able to ping other hosts through Slirp because there's no backroute but you could try this with a direct SLIP connection, an exercise left for the reader. If Slirp doesn't want to respond and you're sure your serial port works (try testing both ends with Kermit?), you can recompile it with -DDEBUG (change this in the generated Makefile) and pass your intended debug level like -d 1 or -d 3. You'll get a file called slirp_debug with some agonizingly detailed information so you can see if it's actually getting the datagrams and/or liking the datagrams it gets. For nslookup, ntp and minisock, the second address becomes your accessible recursive nameserver (or use -i to provide an IP). The DNS dump is also given in the debug mode with slashes for the DNS answer section. nslookup and ntp are otherwise self-explanatory: minisock takes a server name (or IP) and port, followed by optional strings. The strings, up to 255 characters total (in this version), are immediately sent with CR-LFs between them except if you specify -n. If you specify no strings, none are sent. It then waits on that port for data and exits when the socket closes. This is how we did the HTTP/1.0 requests in the screenshots. On the DEC Pro, this has been tested on my trusty DEC Professional 380 running PRO/VENIX V2.0. It should compile and run on a 325 or 350, and on at least PRO/VENIX Rev. V2.0, though I don't have any hardware for this and Xhomer's serial port emulation is not good enough for this purpose (so unfortunately you'll need a real DEC Pro until I or Tarek get around to fixing it). The easiest way to get it over there is Kermit. Assuming you have this already, connect your host and the Pro on the "real" serial port at 9600bps. Make sure both sides are set to binary and just push all the files over (except the Markdown documentation unless you really want), and then do a make -f Makefile.venix (it may have been renamed to makefile.venix; adjust accordingly). Establishing the link is as simple as connecting your server's serial port to the other end of the BCC05 or equivalent from the Pro and starting Slirp to talk to that port (on my system, it's even the same port, so the same command line suffices). If you experience issues with the connection, the easiest fix is to just bounce Slirp — because there are no timeouts, there are also no retransmits. I don't know if this is hitting bugs in Slirp or in my code, though it's probably the latter. Nevertheless, I've been able to run stuff most of the day without issue. It's nice to have a simple network option and the personal satisfaction of having written it myself. There are many acknowledged deficiencies, mostly because I assume little about the system itself and tried to keep everything very simplistic. There are no timeouts and thus no retransmits, and if you break the TCP connection in the middle there will be no proper teardown. Also, because I used Slirp for the other side (as many others will), and because my internal network is full of machines that have no idea what IPv6 is, there is no IPv6 support. I agree there should be and SLIP doesn't care whether it gets IPv4 or IPv6, but for now that would require patching Slirp which is a job I just don't feel up to at the moment. I'd also like to support at least CSLIP in the future. In the meantime, if you want to try this on other operating systems, the system-dependent portions are in compat.h and slip.c with a small amount in ntp.c for handling time values. You will likely want to make changes to where your serial ports are and the speed they run at and how to make that port "raw" in slip.c. You should also add any extra #includes to compat.h that your system requires. I'd love to hear about it running other places. Slirp-CK remains under the original modified Slirp license and BASS is under the BSD 2-clause license. You can get Slirp-CK and BASS at Github.

15 hours ago 2 votes
Transactions are a protocol

Transactions are not an intrinsic part of a storage system. Any storage system can be made transactional: Redis, S3, the filesystem, etc. Delta Lake and Orleans demonstrated techniques to make S3 (or cloud storage in general) transactional. Epoxy demonstrated techniques to make Redis (and any other system) transactional. And of course there's always good old Two-Phase Commit. If you don't want to read those papers, I wrote about a simplified implementation of Delta Lake and also wrote about a simplified MVCC implementation over a generic key-value storage layer. It is both the beauty and the burden of transactions that they are not intrinsic to a storage system. Postgres and MySQL and SQLite have transactions. But you don't need to use them. It isn't possible to require you to use transactions. Many developers, myself a few years ago included, do not know why you should use them. (Hint: read Designing Data Intensive Applications.) And you can take it even further by ignoring the transaction layer of an existing transactional database and implement your own transaction layer as Convex has done (the Epoxy paper above also does this). It isn't entirely clear that you have a lot to lose by implementing your own transaction layer since the indexes you'd want on the version field of a value would only be as expensive or slow as any other secondary index in a transactional database. Though why you'd do this isn't entirely clear (I will like to read about this from Convex some time). It's useful to see transaction protocols as another tool in your system design tool chest when you care about consistency, atomicity, and isolation. Especially as you build systems that span data systems. Maybe, as Ben Hindman hinted at the last NYC Systems, even proprietary APIs will eventually provide something like two-phase commit so physical systems outside our control can become transactional too. Transactions are a protocol short new post pic.twitter.com/nTj5LZUpUr — Phil Eaton (@eatonphil) April 20, 2025

21 hours ago 2 votes
Humanities Crash Course Week 16: The Art of War

In week 16 of the humanities crash course, I revisited the Tao Te Ching and The Art of War. I just re-read the Tao Te Ching last year, so I only revisited my notes now. I’ve also read The Art of War a few times, but decided to re-visit it now anyway. Readings Both books are related. The Art of War is older; Sun Tzu wrote it around 500 BCE, at a time when war was becoming more “professionalized” in China. The book aims convey what had (or hadn’t) worked in the battlefield. The starting point is conflict. There’s an enemy we’re looking to defeat. The best victory is achieved without engagement. That’s not always possible, so the book offers pragmatic suggestions on tactical maneuvers and such. It gives good advice for situations involving conflict, which is why they’ve influenced leaders (including businesspeople) throughout centuries: It’s better to win before any shots are fired (i.e., through cunning and calculation.) Use deception. Don’t let conflicts drag on. Understand the context to use it to your advantage. Keep your forces unified and disciplined. Adapt to changing conditions on the ground. Consider economics and logistics. Gather intelligence on the opposition. The goal is winning through foresight rather than brute force — good advice! The Tao Te Ching, written by Lao Tzu around the late 4th century BCE, is the central text in Taoism, a philosophy that aims for skillful action by aligning with the natural order of the universe — i.e., doing through “non-doing” and transcending distinctions (which aren’t present in reality but layered onto experiences by humans.) Tao means Way, as in the Way to achieve such alignment. The book is a guide to living the Tao. (Living in Tao?) But as it makes clear from its very first lines, you can’t really talk about it: the Tao precedes language. It’s a practice — and the practice entails non-striving. Audiovisual Music: Gioia recommended the Beatles (The White Album, Sgt. Pepper’s, and Abbey Road) and Rolling Stones (Let it Bleed, Beggars Banquet, and Exile on Main Street.) I’d heard all three Rolling Stones albums before, but don’t know them by heart (like I do with the Beatles.) So I revisited all three. Some songs sounded a bit cringe-y, especially after having heard “real” blues a few weeks ago. Of the three albums, Exile on Main Street sounds more authentic. (Perhaps because of the band member’s altered states?) In any case, it sounded most “in the Tao” to me — that is, as though the musicians surrendered to the experience of making this music. It’s about as rock ‘n roll as it gets. Arts: Gioia recommended looking at Chinese architecture. As usual, my first thought was to look for short documentaries or lectures in YouTube. I was surprised by how little there was. Instead, I read the webpage Gioia suggested. Cinema: Since we headed again to China, I took in another classic Chinese film that had long been on my to-watch list: Wong Kar-wai’s IN THE MOOD FOR LOVE. I found it more Confucian than Taoist, although its slow pacing, gentleness, focus on details, and passivity strike something of a Taoist mood. Reflections When reading the Tao Te Ching, I’m often reminded of this passage from the Gospel of Matthew: No man can serve two masters: for either he will hate the one, and love the other; or else he will hold to the one, and despise the other. Ye cannot serve God and mammon. Therefore I say unto you, Take no thought for your life, what ye shall eat, or what ye shall drink; nor yet for your body, what ye shall put on. Is not the life more than meat, and the body than raiment? Behold the fowls of the air: for they sow not, neither do they reap, nor gather into barns; yet your heavenly Father feedeth them. Are ye not much better than they? Which of you by taking thought can add one cubit unto his stature? And why take ye thought for raiment? Consider the lilies of the field, how they grow; they toil not, neither do they spin: And yet I say unto you, That even Solomon in all his glory was not arrayed like one of these. Wherefore, if God so clothe the grass of the field, which to day is, and to morrow is cast into the oven, shall he not much more clothe you, O ye of little faith? Therefore take no thought, saying, What shall we eat? or, What shall we drink? or, Wherewithal shall we be clothed? (For after all these things do the Gentiles seek:) for your heavenly Father knoweth that ye have need of all these things. But seek ye first the kingdom of God, and his righteousness; and all these things shall be added unto you. Take therefore no thought for the morrow: for the morrow shall take thought for the things of itself. Sufficient unto the day is the evil thereof. The Tao Te Ching is older and from a different culture, but “Consider the lilies of the field, how they grow; they toil not, neither do they spin” has always struck me as very Taoistic: both texts emphasize non-striving and putting your trust on a higher order. Even though it’s even older, that spirit is also evident in The Art of War. It’s not merely letting things happen, but aligning mindfully with the needs of the time. Sometimes we must fight. Best to do it quickly and efficiently. And best yet if the conflict can be settled before it begins. Notes on Note-taking This week, I started using ChatGPT’s new o3 model. Its answers are a bit better than what I got with previous models, but there are downsides. For one thing, o3 tends to format answers in tables rather than lists. This works well if you use ChatGPT in a wide window, but is less useful on a mobile device or (as in my case) on a narrow window to the side. This is how I usually use ChatGPT on my Mac: in a narrow window. o3’s responses often include tables that get cut off in this window. For another, replies take much longer as the AI does more “research” in the background. As a result, it feels less conversational than 4o — which changes how I interact with it. I’ll play more with o3 for work, but for this use case, I’ll revert to 4o. Up Next Gioia recommends Apulelius’s The Golden Ass. I’ve never read this, and frankly feel weary about returning to the period of Roman decline. (Too close to home?) But I’ll approach it with an open mind. Again, there’s a YouTube playlist for the videos I’m sharing here. I’m also sharing these posts via Substack if you’d like to subscribe and comment. See you next week!

14 hours ago 1 votes
My approach to teaching electronics

Explaining the reasoning behind my series of articles on electronics -- and asking for your thoughts.

yesterday 2 votes