More from Marian's Blog
I've always been a fan of the Lego Technic series, especially those models that have gears and cranks and moving parts. But it seems that Lego is shifting the focus of the Technic series away from functional models, so I had to take matters into my own hands. I think an orrery is the perfect project to build out of Lego Technic parts as it makes for a cool display set and is functional at the same time. Introduction An orrery is a scientific instrument that models the motion of the celestial bodies. These are typically the orbital period and the period of the rotation around the axis of a body. For example, the time it takes the moon to travel around the earth once is ~27 times the time it takes the earth to rotate around it's own axis (a day). This relation is accurately modelled in an orrery. However, the orrery does not model distances or the relative sizes of the celestial bodies. I was very excited when I saw this Earth, Moon and Sun Orrery design by JK Brickworks in 2016: This inspired me to build my own orrery with the same spheres for the sun, earth and moon, but my own custom designed mechanism: While working on this, I experimented with 3D printing custom fixtures to put Lego gears in, but eventually moved on to building an orrery out of Lego parts only. This orrery is special in the sense that it was designed by me, but it doesn't have any features beyond what the design from JK Brickworks has to offer. It models the orbit of the earh around the sun, the orbit of the moon around the earth, and the spin of the sun, earth and moon. The moon is tidally locked to the earth, meaning it rotates once per orbit. One half of the moon always faces the earth and the other always points away from it. (This also means that if you're on the surface of the moon, the earth always stays in the same spot in the sky.) The tidal locking means that the rotation of the moon easy to implement in an orrery, since the moon will move correctly as soon as it is attached to an arm that goes around the earth at the correct orbital period. You only model the orbital period and get the rotational period for free. To improve upon my first model, I wanted to add two more features: The axial tilt of the earth and the inclination of the orbit of the moon. Axial tilt The rotational axis of the earth is tilted 23.5° relative to its orbital plane. For the orrery, this means that the rotational axis of the earth is not pointing straight up. To make matters worse, the axis always points in the same direction. It does not rotate relative to the stars or the table that the orrery stands on, but it has to rotate relative to the main arm that holds the earth. So not only does the earth itself rotate, the part that holds the earth also needs to rotate to cancel out the rotation of the arm. Modelling this feature allows us to use the orrery to explain the origin of the seasons. To construct the actual tilted holder for the earth sphere, I used an angled connector with a 22.5° angle, which is only one degree off the correct angle. Inclination of the orbit of the moon The second pehonomenon I wanted to model in my orrery is the fact that the plane of the moon's orbit is inclined relative to the plane of the earth's orbit. The real inclination angle is 5.15°, which you would barely notice if it was modelled accurately. But the inclination of the moon's orbit plays in important role in explaining when eclipses occur, so I think it is useful to model an exaggerated orbital inclination. In the orrery, we need to make the moon go up and down along its journey around the earth. My mechanical solution for this problem is inspired by this video. The moon arm has a wheel that rides on a tilted circular gear rack. A parallelogram linkage keeps the moon vertically aligned. Just like the attachment point for the earth, this assembly rests relative to the stars, meaning it needs to rotate relative to the main arm. In summary, there are four parts on the earth arm that need to rotate at the correct rate: the earth, the base for the earth (to ensure the correct direction of the axial tilt), the moon, the inclination ring for the moon. In theory, the earth base and the moon inclination ring could be a single part (both should rotate at the same rate). But making a tilted ring out of Lego parts is difficult enough and I didn't manage to do it in a way where the moon "rides" on it while it is driven from the outside. Thus, the solution to this is to make the moon inclination ring a separate assembly. Being able to move the tilted ring independently allows us to implement an additional astrodynamic phenomenon, the precession of the nodes of the orbit of the moon. It turns out that the direction in which the upper and lower parts of the orbit are pointing isn't fixed after all. They slowly rotate around the moon, once every 18.6 years. Concentric axes One problem specific to building an orrery in Lego is achieving concentric rotation. In a traditional orrery, you can have an unlimited number of parts rotating around the same axis just by placing tubes of increasing size inside each other. But these tubes don't exist as Lego pieces. For the earth-moon system, I need four assemblies rotating independently around the same axis. Here is how I achieved it for the earth-moon subassembly: The first axis (1) is a regular Lego Technic axle. It drives the rotation of the earth. The second axis (2) is built using a Lego Technic turntable. It is responsible for the orbit of the moon. The third axis (3) is a circular gear rack. It fulfills the same role as a turntable, just with an inner radius large enough to fit another turntable inside it. This part is the tilted ring that is needed to model the orbital inclination of the moon. Now there is only one axis missing, which is the rotating holder for the earth (4). Remember, it needs to rotate relative to the main arm of the orrery so that the rotational axis of the earth points at a constant direction. This is done with a second turntable stacked on top of the first one. To drive it, I'm using this trick: The Differential Gear Casing has a round hole and gears at both ends. That means an axle can be placed inside it and rotate independently. This can be used to transmit torque by connecting gears at both ends. This serves the purpose of the aforementioned tubes that don't exist as Lego parts, with the exception of this gear casing. The gear casing drives some gears on the first turntable, which in turn drive the second turntable. Gear sequences The earth-moon subassembly has four axes that need to rotate at the correct period, namely the earth, the moon, the earth's base and the moon's inclination ring. The entire orrery has two more: the rotation of the sun around its own axis and the rotation of the main arm, which makes the earth travel around the sun in 365.25 days. The sun is not a solid body and spins at different speeds depending on the latitude. For orrery design, that means that any rotational period between 24 and 38 days can be argued to be correct, since there is a latitude of the sun where the surface rotates at this period. This gives rise to the problem of finding a gear sequence for a given transmission ratio. In my orrery, one rotation of the crank corresponds to one day. So to get the main arm to rotate once every 365.25 days, it needs to be connected to the crank using a gear sequence with a transmission ratio that is as close as possible to 365.25. The gear ratio of a gear sequence is the product of the fractions of tooth numbers for each pair of gears in the sequence. To find a sequence of Lego gears, there are some additional constraints. Most importantly, we are limited by the gears that Lego actually makes: Notice the worm gear in the above image. It's special because it behaves like a one tooth gear, i.e. paired with a 24 tooth gear, it will create a transmission ratio of 1:24. But it can only be the driving gear in a pairing, not the driven gear. Suppose we want to achieve a transmission ratio of 17:23. This could be done easily with a pairing of a 17-teeth and a 23-teeth gear in a traditional orrery. But these teeth numbers aren't available as Lego gears, so we have to find another way. I made a tool to find Lego gear sequences for a given transmission ratio. For our example of a 17/23 ratio, the app finds plenty of solutions with slightly different ratios, for example a ratio of 20/27, which is only 0.22% off from the target ratio: The problem of finding a gear sequence for a given transmission ratio is connected to some interesting ideas from number theory. It is closely related to prime factor decomposition. In fact, an exact solution is only possible if all the prime factors in the target ratio are also available as prime factors in the available gears. The tool will start from the target fraction (or slightly different fractions when searching for approximate solutions) and go through a list of expansions of that fraction that are possible with the prime factors given by the available gears. For each fraction, it will calculate all sequences of gears that have the numerator or denominator of that fraction as their product. Given such a pair of sequences of driver gears and follower gears, the app still needs to decide which driver gear is paired with which follower gear. This is important in the context of Lego gears because some pairings are more practical than others in a Lego project. In particular, if a pair of gears has a tooth sum that is divisible by 16, the distance between the axles is a whole number of Lego units. This combination is the easiest to build. Here is an example of pairing a 24 and 40 tooth gear. The tooth sum is 64 = 4 * 16, which tells us that the gears will be four Lego units apart. If the tooth sum is divisble by 8, the distance can be achieved by placing the axle at a half unit offset, which is possible with various building techniques: The app has a "Fit gears" tab which will show you possible ways of connecting any given pair of gears. For example, this is what I call a "2D connection". These gears fit together if you offset them 1 unit in one direction and 2 units in the second direction, since their distance is quite close to sqrt(1² + 2²): In addition, there are also gears that Lego calls "bevel gears", they have n * 8 + 4 teeth and can be connected perpendicularly, allowing you to change the plane of the meachnism. So for any pair of a driver and follower gear, there are potential ways to connect them, but some combinations are easier to build and some are harder. My app assigns a "cost" to each possible pairing of driver and follower gear based on how they can be connected. If the teeth numbers add up to a multiple of 16, they are the easiest to connect, meaning the cost is the lowest. Once the tool has a set of driver gears and follower gears, it uses the Hungarian algorithm to generate a matching with minimal cost. Here is an example of a result when searching for a sequence with a ratio of π: There is one more interesting mathematical curiosity I want to mention. Lego's differential gears have the property that the differential casing rotates at a rate that is the mean of the rates of the two connected axles. For example, if one axle rotates once per second and the other axle three times per second, the casing will rotates twice per second ((1 + 3) / 2). We can exploit this to achieve prime factors that are otherwise impossible with Lego gears. For example, there are no Lego gears with a tooth number that is a multiple of 13, so transmission ratios that include 13 as a prime factor are not possible with the standard approach. To achieve a transmission ratio of 13, the gear ratio calculator gives us this sequence: The inputs axles spin at a rate of 1 and 25 and the resulting rate is (1 + 25) / 2 = 13. This how this sequence could be built: In this example, the blue axle spins 13 times as fast as the red axle. This brings us to the end of the digression about gear sequences. The app is available here if you want to try it out. The Typescript source code is also available. Here are all the gears in the final orrery design: These are the rotational and orbital periods in the orrery, comparing them to the real-world values: The mechanism was mostly designed using physical parts and my gear sequence app. This is a work in progress photo: One of the main challenges during the design of the orrery, besides making it work at all, was to get the orrery to run smoothly. The final design has 70 gears and each additional gear adds a little bit of friction. Another source for resistance is the weight of the earth as it spins relatively fast and thus contributes a lot to the torque needed to run the orrery. I put a lot of effort into making the orrery run smooth. But the mechanism can easily get stuck, for example if any moving part is assembled too tight or if the orrery gathers dust. During assembly, it is important to make sure that every moving part has a some slack. One way to work around this problem is to apply a little bit of silicone spray. This will make the orrery run effortlessly without any problem. But arguably a Lego set should be judged by how well it works on its own and purists will say this is like gluing parts together. Designing the foot I took this photo when I first managed to get the entire mechanism to work. This contains the first version of a foot I designed, which has a Lego motor to drive the orrery. It turns out that this motor is really loud when it runs and it ruins the experience of using the orrery. I envision it as a piece that you can use to explain and demonstrate astrodynamic phenomena to someone. But this doesn't work when the motor noise makes it impossible to have a conversation, so I decided to remove the motor. Designing the foot turned out quite tricky, it's an artistic challenge more than a technical one. I went for a 12-segment design, which is a nod to the twelve months and has a tradition in orrery design. I designed most of the foot digitally using Bricklink Studio. This part lends itself more to digital design since it doesn't move. Due to the 12-fold symmetry, testing any idea requires a large number of parts. Here are some iterations on the foot design: You can see that each design has a little lever to disengage the driving mechanism from the main arm. This allows you to move the arm freely, which is quite useful in practice. Here is a video of the final orrery design in action: Instructions I used Bricklink Studio to make a digital model of my orrery. During the design process, I switched back and forth between buildig with physical parts and designing digitally. I also used this program to create digital building instructions. My takeaway from this is that it works really well for small models, but it has serious usability problems on larger models. When you make changes to the model or change an early step in the instructions, it will often break the layout of the instructions in many places so that most of it needs to be redone. This was a lot of tedious work, but I ended up with a 264 page, 436 step instructions PDF. I'm selling the digital instructions for 20€ on Rebrickable. On Rebrickable, you get a parts list and some really useful tools to help aquiring the parts (even without buying the instructions). For example, you can tell it which Lego sets and parts you own and based on that it will know which of the required parts of the orrery you already have. It also helps you with buying the parts you don't have. Rebrickable estimates that the total cost of the required parts is 260€. Together with the cost of the instructions, this makes the whole project quite expensive for potential buyers. I ended up selling a few instructions, but ultimately this project wasn't a commercial success. In hindsight, it might have been a better idea to give the instructions away for free. To promote the listing on Rebrickable, I made an animation that shows the movement of the gears in the orrery. I used Mecabricks to convert the Bricklink Studio design into a scene file that contains 3D meshes of all the parts. This could then be opened in Blender, where I created all the animations and the camera movement. Just like making the instructions, this is realtively straight forward, but it became quite complex due to the large number of parts in the design. Here is the resulting animation (at 45 seconds, the structural parts fade out to reveal the mechanism): Lower partcount design The above design aims to make the best possible Lego orrery with few compromises. This is at the expense of having a high part count and some parts that are hard to get. I made a modified design with a lower part count by redesigning the foot. The foot in this version is much lighter, has less ornaments and uses Technic parts instead of traditional bricks. It has one crank instead of two. To make it easier to get the parts for this orrery, I changed the design so that it uses as many parts as possible from Lego's Rough Terrain Crane. I chose this set because it was well reviewed, had a low cost per part, and many people own it. Unfortunately, Lego has now stopped selling it. Many parts in my first orrery design use a "traditional" color scheme where gears, pins and axles are mostly gray and black. This color scheme was used by Lego in the past and is often preferred by fans. In the new design, I use the same colors that Lego uses in its current products, which means the parts are easier to get and more of them can be taken from the red crane. The new, more colorful palette makes the building instructions easier to follow, but the colors can look out of place in the orrery. Other orreries I released the instructions for my orrery in September 2021. A few months later in 2022, CaDA released an orrery designed by JK Brickworks. This is the designer behind the "original" Lego Orrery from 2016, which got me interested in this hobby in the first place. The feature set of this orrery is similar to mine, with the exception that it doesn't model the inclination of the orbit of the moon. As you can see, omitting this feature lowers the complexity considerably and allows for a much simpler and more reliable design. The lower part count makes this orrery more accessible to a wider audience. I'm very impressed with this design and I think it's the best product to get for anyone interested in building an orrery. Two years later in 2024, Lego released their own orrey design. I was quite excited at the prospect of an official Lego orrery, but I was disappointed by the execution. This orrery has the same feature set as the one from CaDA, it has the earth's axial tilt, the moon's orbit but not its inclination. Compared to the CaDA design, it has less parts but is more expensive. I think Lego were too aggressive in reducing the part count. The orrery is very fragile and wobbly. For the sun and earth, they went with custom hemisphere pieces. I like the idea, as it allows the bodies to be perfectly spherical. But the parts have very prominent connection holes, which are unneccesary and ruin the look. The earth has the continents printed on it, which is a nice idea but the prints were poorly aligned in my model. All in all, I would recommend the CaDA orrery over this one. But I'm excited to see that building block orreries are reaching a bigger audience and I hope that companies like CaDA and Lego will offer more elaborate designs in the future. Outlook This is a mockup of an idea to model all the planets in the solar system without vertical stacking (well, except for the inner planets, but it's mostly a flat design). The four outer planets would move on four tracks of chains that are held in place by the vertical poles (the chains are not shown in the render). The chain would be driven by a gear attached to the base and the planet would be attached to one of the chain link pieces. Note that in this setup, the planets only travel around the sun, but don't spin around their own axes. This concept is inspired by a design for a clock from Akiyuki. It's hard to tell if the chain could handle the weight of a brick-built planet. Either way this build would be crazy in terms of dimensions, part count and cost. At the moment, I'm not planning to continue working on this concept. If you remove the constraint of no vertical stacking, the problem becomes only slightly easier, but possible as demonstrated by Chris Orchard and Brent Waller in their Lego Ideas campaign. My next project is to build an orrery using laser cut acrylic sheets instead of Lego parts. This removes a lot of constraints on the gears and parts that can be used, but has the added difficulty that I need to manufacture these parts myself.
This article describes how I generate an infinite city using the Wave Function Collapse algorithm in a way that is fast, deterministic, parallelizable and reliable. It's a follow-up to my 2019 article on adapting the WFC algorithm to generate an infinite world. The new approach presented in this article removes the limitations of my original implementation. I first mentioned these ideas in this Twitter thread. Objective The goal is to procedurally generate a 3D environment by placing human designed blocks on a 3D grid. The blocks need to be placed in accordance with given adjacency contraints. For each of the 6 sides of each block, some information about the face and its symmetry is used to generate a list of possible neighbors. This is different from the original formulation of the WFC algorithm, where the possible blocks, their adjacency rules and their spawn probabilities are extracted automatically from an example texture. In this improved version, the generation method is robust enough to be shipped in a commercial game, so it needs to be reliable, fast and allow for artistic control over the result. Wave Function Collapse This article is aimed at readers who already know how the WFC algorithm works, but here is a brief recap. Remember, I'm skipping the part where blocks are extracted from an example texture and I'm only using the part where we generate a new "texture". Maxim Gumin on Github) The algorithm starts with an array of slots (the "wave function"), where nothing is decided. Each slot has a list of possible blocks (or "modules") that can be placed there and in the starting state, each list contains all modules. The algorithm will then do collapse steps and a constraint propagation steps until the map is fully collapsed or until it has reached a dead end. Collapse step We pick the slot with the lowest entropy and collapse it. That means we pick one of the modules from that slot's list of possible modules and set it as the selected module for this slot. Intuitively, the "slot with the lowest entropy" is the one with the least amount of choice. If all modules have the same spawn probability, the slot with the fewest possible modules is the one with the lowest entropy. Constraint propagation Collapsing a slot effectively shrinks the list of possible modules of that slot to 1. The constraint propagation step propagates this information through the map by removing modules from the respective lists of other slots that relied on a different choice for the slot we just collapsed. The constraint propagation step of the WFC algorithm is the most compute intensive part. End The algorithm terminates when all slots are collapsed, which means success, or when the list of possible modules for any slot is empty. In that case the procedure has failed and one could backtrack or start over. The original approach and its limitations (skip this section if you just want to know the solution) WFC is usually applied to finite maps that can be stored in an array. In my original post, I described why I thought it would be impractical to do a chunk-based WFC implementation. (I had not figured out how to avoid the problem that constraints need to be propagated across chunk boundaries) Instead, I stored the map in a dictionary, where new slots would be allocated when they were needed or when they were touched by constraint propagation. That means, even to geneate a small area of the map, a large cloud of slots around that area would be allocated since constraint propagation could "bounce back" into the area we're interested in. Problems of that approach include: Non-determinism: The result of the generation depends on the order in which parts of the map are generated (and thus on the path the player takes). Memory leak: We can't release the memory used to generate a part of the world when the player leaves since we don't know at what point distant slots no longer have an effect on local world generation. Reliability: The longer you walk around, the higher the chance becomes that the WFC algorithm runs into a dead end and is unable to continue generating the map. Single threaded: Since there are no chunks, all operations on the map datastructure need to be sequential and can't run in multiple threads. In practice, the map height had to be limited so that the map generation was fast enough. My implementation of this flawed approach is still available on Github and a playable demo is on itch.io. If you want to implement your own WFC algorithm, you shouldn't do it like that though! Chunk-based WFC The idea is to start with a simple pre-generated, tiling map and generate "fitting" replacements at runtime. However, we do this at an offset so that the seam (which would otherwise look the same for each chunk) is replaced. In this section, I'll explain in detail what that means. This solution is a refinement of ideas proposed by Paul Merrel and BorisTheBrave. We start by generating a simple, finite, tiling, 8x8x8 map: This is done offline. We use a small subset of the available modules to make it as simple as possible. The map is tiling in the sense that the boundaries on opposite sides match, so copies of this map could be placed next to each other seamlessly. Doing that would look like this: Generating a tiling map is done by "wrapping around" the constraint propagation at the map boundary. In a finite map, when we propagate a constraint to a slot outside the map, we discard that information. In a tiling map, the slot on the opposing map boundary is treated as if it was a neighbor. Next, we pre-generate a set of replacements for our starting map. We use the boundary of the starting map as a boundary constraint to generate these replacements. For any slots on the boundary of these new maps, we only allow modules with a matching profile for those sides that face the map boundary. This means that we can "swap out" our starting map with any of the pre-generated patches without creating a visible seam. Now we can randomly choose from our collection of pre-generated patches at runtime and we have a simple chunk-based infinite world generator: Note that we're not doing any WFC generation at runtime yet, we're just placing pre-generated 8x8x8 block patches. Since all these patches have a matching boundary, we can spot this chunk boundary as an unnatural pattern in the generated world. Now for the important part: At runtime, we generate an 8x8x8 replacement map for each chunk, but we do it at a 4 block offset in both horizontal directions. The starting point for each chunks's generation is made up of the four pre-generated patches that touch it. The replacement map we generate at runtime has a boundary constraint to "fit in", just like our pre-generated patches. However, due to the offset, the boundary that is shared between all pre-generated patches is replaced at runtime and the area that is different in every pre-generated patch remains unchanged during the runtime generation. (This is needed so that neighbor chunks can be generated independently from each other.) If this replacement map fails to generate, we just copy the blocks from the starting patches. Here is the result of that: Notice how the chunk boundary artifacts from the previous screenshot are gone! Consider this drawing, where the gray blocks are one chunk (seen from above). We determine the four starting patches that overlap this chunk (the blue boxes). This needs to be random but deterministic, since neighbor chunks will need to use the same information. We query the pre-generated patches at the boundary of the chunk (shown in green) and use this as the boundary constraint for the generation of the chunk. The green boundary area will stay the same during runtime generation, but this looks ok due to the variance in the pre-generated patches. The blue boundary is the same for each pre-generated patch, but will be replaced at runtime. Note how this has the properties we want: Each chunk can be generated deterministically and independently from other chunks. If the generation for one chunk fails, we simply copy the blocks from the starting patches. Using a heightmap In this section, I'll explain how to generate a world in the shape of an arbitrary heightmap. Consider an integer heightmap where the difference between two adjacent points is always one. The next point is either one above or one below, but never at the same level or anywhere else. Each 2x2 cell in that heightmap has one of these six shapes: For each of these six possible 2x2 cell shapes, we pre-generate a set of starting patches: These starting patches are no longer tiling in the classic sense. Instead, each side matches the opposite side with a vertical offset. With our special integer heightmap where adjacent points always have a difference of 1, we will now generate one chunk for each point in the heightmap. Our query point has four adjacent 2x2 cells. For each 2x2 cell, we determine which of the six possible shapes it has and pick a pre-generated starting patch from the respective collection. Then, we generate a replacement map as explained in the previous section. Here is an example of the heightmap in engine, each chunk is represented as one flat quad: This mesh is used to render the world far away from the camera. I added a "city like" texture and some billboard-rendered fake buildings. In the foreground, you can see the actual chunks generated by the algorithm: Okay, now we know how to turn our integer heightmap into a cool looking infinite WFC world, but how do we get that integer heightmap in the first place? How do we get a function that generates an integer elevation function where the vertical difference between two adjacent points is always 1 or -1, but never 0 or anything else? We start with a target function that doesn't have this property. In my case, I'm using 8 octaves of Perlin noise, but any heightmap can be used here. Then, we use an elaborate clamping process to force the step constraint on our target function. It works in a hierarchical way, similarly to descending down a quadtree. We start with a relatively large square (the root of the quadtree) and evaluate our target function for the four corners. Then, we generate the heightmap value on the edge centers and the square center by querying the target function and then clamping the value to fulfil our slope constraint. The slope constraint requires that the vertical difference is less than or equal the horizontal difference. If our query point is inside any of the four quadrants, we repeat this process for the respective quadrant (descending the quadtree). If our query point is one of the points we just calculated, we're done. The hierarchical nature of this approach means that it lends itself very well to caching. Here is a 2D visualization of the process: The blue line shows the target function. At every descend step down the quadtree, new limits are introduced to adhere to the slope constraint (shown as black lines). The orange dots are the values of our resulting heightmap. Outlook and notes Each chunk can be generated independently. That makes it easy to parallelize the computation required to generate the world. In my case, I'm using Unity's Burst compiler to do the runtime generation. By varying the module probabilities for different areas of the map, I can generate different biomes. Here is an example of a biome boundary: The biome on the left spawns copper roofs and bridges, the one on the right spawns tiled roofs and arches. On the boundary, there is one row of chunks where modules from both biomes can spawn, creating a natural looking transition. I want to mention some progress on this project that is unrelated to the WFC algorithm. Since my last blog post in 2019, I've created lots of new blocks, textured them, added a water plane and added procedurally generated trees and climbing plants. The trees and plants are generated at runtime using the Space Colonization algorithm and adapt to the geometry of the world. The next challenge is to come up with interesting gameplay ideas for this project.
This article provides an overview of the paper "Adversarial Generation of Continuous Implicit Shape Representations", which I co-authored with Matthias Fey. While the paper focuses on the theoretical aspects, I'll provide a higher level explanation and and some visualizations here on the blog. In the paper, we propose a GAN that generates 3D shapes. The GAN uses a DeepSDF network as a generator and either a 3D CNN or a Pointnet as the discriminator. Update June 2020: Here is a recording of the presentation I gave about the paper at the virtual Eurographics 2020 conference. DeepSDF First, I'll introduce the idea behind DeepSDF. The standard way of representing 3D shapes in deep learning models uses voxel volumes. They are a generalization of images to 3D space and and use voxels instead of pixels. With this 3D CNN approach, concepts from deep learning for images can be applied to 3D shapes. However, CNNs are well suited for learning texture properties and learning to represent a 3D shape as a combination of 3D texture patches is not ideal. Furthermore, due to the memory requirements of this approach, high voxel resolutions are not feasible. A voxel volume contains a rasterized representation of the signed distance field of the shape. The signed distance function is a function that maps a point in 3D space to a scalar signed distance value. The idea behind the DeepSDF network is to train a neural network to predict the value of the signed distance directly for an arbitrary point in space. Thus, the network learns a continuous representation instead of a rasterized one. An SDF network for a single shape has three input neurons and one output neuron. The DeepSDF network doesn't use convolutions. To learn a representation of multiple shapes, the DeepSDF network receives a latent code as an additional input. The decision boundary of the network is the surface of the learned shape. For a given latent code, a mesh can be created using Marching Cubes by evaluating the network for a raster of points. The resolution of that raster can be selected arbitrarily after the network was trained. The DeepSDF network is trained on a dataset of 3D points with corresponding SDF values. These points are in part uniformly sampled and in part normally distributed around the shape surface, resulting in a high density of training data near the surface. Generating SDF data for the ShapeNet dataset is quite challenging because the dataset contains non-watertight meshes. I made my own implementation of the approach proposed in the DeepSDF paper, as well as a slightly different approach. I published this project as a python module. I created my own implementation of the DeepSDF network and trained it on the ShapeNet dataset. The DeepSDF autodecoder works like an autoencoder, but without the encoder. The latent codes are assigned randomly and then optimized during training using SGD. In the animation above, you see a t-SNE plot of the learned latent space of the DeepSDF autodecoder for the ShapeNet dataset. The colors show dataset categories which are not known to the network. It has learned on its own to assign similar latent codes to shapes of the same category. The shapes on the left are reconstructions of latent codes along a random spline path through the latent space. Click here for a high resolution version of the animation. GAN So far, I introduced the DeepSDF autodecoder that learns to reconstruct a given dataset of shapes. The contribution of our paper is to propose a GAN architecture that trains a DeepSDF network to generate new shapes. A generative adversarial network is a pair of a generator network and a discriminator network. The generator proposes new samples (in our case shapes) and the discriminator predicts whether a given sample was generated by the generator ("fake") or taken from the dataset ("real"). The trick is that the generator is improved using the gradient of the discriminator output and the discriminator is trained on the dataset of real samples and the increasingly realistic fake samples provided by the generator. Thus, the generator and discriminator have adversarial goals. If the GAN training is successful, the GAN reaches an equilibrium where the generator has learned a mapping from the latent distribution to the underlying distribution of the dataset and discriminator assesses generated and real samples with the same score. 3D GANs based on 3D CNNs have been proposed. Our research question was whether a GAN can be trained where the generator uses the DeepSDF architecture. Usually, the discriminator of the GAN is a mirrored version of the generator. In the case of the DeepSDF network, this is not feasible because a single sample of the DeepSDF network provides only the SDF for one point. From one point alone, a discriminator could not assess if the sample is realistic. Instead, the discriminator needs multiple points to judge the output value in context. Voxel discriminator One solution to the problem of missing context is to use a 3D CNN as the discriminator. In this case, the generator is evaluated for a batch of raster points and the resulting SDF values are rearranged into a voxel volume. The idea to combine a continuous implicit shape network with a 3D CNN was proposed by Chen and Zhang for their autoencoder. The training data for the voxel discriminator are voxel volumes. We use datasets with resolution 8³, 16³, 32³ and 64³ and apply progressive growing. We use the Wasserstein distance with gradient penalty (WGAN-GP). Here is a latent space interpolation of shapes generated with the voxel discriminator: Another interesting observation is the generator's ability to generalize from the raster points to intermediate points that it was not trained on. Here is an example of a network that was only trained on 16^3 voxels (!), which is the first stage of the progressive growing. On the left, the shapes were reconstructed with 16^3 raster points (the same resolution it was trained on). Scaling up the resolution linearly makes it smoother, but doesn't add any detail. When querying the network at 128^3, we see that the network is able to generalize to higher resolutions. For some parts of the geometry, the low resolution volume has no sample points with negative values, resulting in apparently missing geometry. In these cases, the network still generates negative SDFs for intermediate points, thus making the geometry appear when sampled at a higher resolution. Pointnet discriminator Since the voxel-based discriminator keeps some of the disadvantages of fully voxel-based GANs, we investigated an approach that doesn't use voxels at all. The Pointnet is a neural network architecture that can operate on point clouds. In our case, we sample a point cloud of uniform points and use the generator to predict SDFs for the points. The Pointnet then receives the positions of the points and the signed distance values as a "feature vector" with one element. This way, we avoid the fixed raster points and use always changing query points. A Pointnet typically infers information from the spatial structure of the points, which in our case is random. Regardless, we found that the Pointnet can be used as the discriminator in our case. To train the GAN with the Pointnet discriminator, we use a ground truth dataset of uniformly sampled points with their corresponding SDFs. We "grow" the point clouds during training simply by increasing the number of points. When the surface of the generated SDF is reconstructed using Marching Cubes, only values close to sign changes matter. We would like the network to spend more model capacity on values close to the surface, as they influence the result the most. We achieved that by refining the network with additional sample points close to the surface. The gradient of the SDF gives us the direction towards the shape's surface and for a neural network the gradient is easily computed. Using that, we can move randomly sampled points closer to the surface of the generated shape. The non-uniform point cloud can then be evaluated by the discriminator. Since the Pointnet takes the positions of the points into account, it could discern a uniformly sampled point cloud from a surface point cloud. Thus, we add surface points to the ground truth data as well. Here are some examples of shapes generated with the Pointnet discriminator and uniformly sampled points (top) and shapes generated with the refinement method (bottom). (I'll replace this with an animated version soon.) For both approaches, we found a surprising result when trying different levels for Marching Cubes. By default, Marching Cubes reconstructs the isosurface at the level 0, i.e. the surface where the predicted SDF is zero. However, we can choose another level, effectively eroding or dilating the shape. We would expect that the ideal level would be 0, since that is the isosurface of the training data. However, when experimenting with different values, we observe that some features appear to be missing at level 0. We chose an isosurface level of 0.04 to reduce missing geometry at the cost of slightly rounded corners. Since we clip the ground truth SDF at -0.1 and 0.1, the isosurfaces of generated SDF outside of that range are not usable. The source code for this project is available on Github.
I was part of a university project group that develops autonomous model race cars. We are a group of twelve students working on the project in part time for year. We were provided with a car that meets the requirements for the F1/10th competition. Even though competing in F1/10th was not our goal, we kept the rules for the competition in mind. We focussed mostly on trying different driving algorithms, which I'll explain below. The software we developed is available on Github. This post is about what we did and what insights I gained from this work. Hardware Our car is a regular 1/10 scale RC car where the the RC receiver is replaced with a computer. The car is based on a Traxxas Ford Fiesta ST Rally. The computer is an Nvidia Jetson TX2. The car is equipped with a LIDAR scanner with a resolution of 1 ✕ 1080, an IMU and a stereo camera. We ended up not using the stereo camera. ROS Our software stack is based on ROS and Gazebo. We used ROS and Gazebo because it is the standard for academic robotics projects. Most online materials assume that your project uses on ROS. ROS is a robotics framework that facilitates communication between robot compontents such as sensors and actors. All the hardware components we use have ROS nodes that publish or subscribe to data in a standardized way. ROS provides a rich ecosystem of software. For all kinds of robot parts and control problems there are ROS nodes available. You can get pretty far by just tying together existing ROS packages. ROS nodes can be written in C++ and Python 2.7 and since each node is its own process, both languages can be used at the same time. A ROS architecture is built like a distributed system with nodes that subscribe and publish to topics. This is the case even if everything runs on a single computer, like with our project. This creates some overhead. Using ROS means that you'll write lots of configuration files, package definition files, launch files, robot description files that all reference each other. With Gazebo, you'll also need mesh description XML files and world definition files. However, I think that using ROS is worth it for ecosystem of existing ROS modules. Gazebo We use Gazebo as a simulation environment. Our robot car can drive in the real world and in the Gazebo simulation using the same software. Having this simulation turned out to be extremely useful. Since we mostly used the Lidar, the simulated track is designed with walls enclosing the track. The visual appearance and realism was less important for our use case. What follows next is my rant about Gazebo. Gazebo is bad software in every way. It is poorly documented, buggy and misses important features. For example, about 20% of the time, Gazebo doesn't launch at all and crashes instead. In the video above, you can see that the car receives shadows, but the track doesn't. See how the car isn't shiny? That's because Gazebo's lighting model doesn't support metalicity. Getting it to render the scene this decent was super difficult, it would be trivial to get way better looks in an engine like Unity. These are just a few out of many problems we had with Gazebo. Seemingly trivial things are unneccesarily difficult. My takeaway is: Avoid using Gazebo if at all possible. Use a game engine instead. Getting a working game engine to simulate a robotics scenario is way easier than getting Gazebo to do what it's supposed to do. For example, there is a project to let ROS communicate with the Unity engine. This is what you should use instead, it will save you a lot of headaches. There are some features in Gazebo specific to robot simulation that a game engine doesn't provide, such as a headless mode. Now for the interesting part, the autonomous driving algorithms. All algorithms use only LIDAR and IMU data. We didn't use the camera. SLAM SLAM stands for simultaneous localization and mapping. It is the concept that the robot determines its own position in an unknown environment based on a map that it creates while exploring the enviroment. For autonomous racing, this means that the car generates a map of the racetrack and can then calculate and follow an ideal racing path using that map and its position. Since we use ROS, we can use lots of existing SLAM implementations that are built specifically for ROS. In addition, ROS offers the navigation stack, which has implementations for path planning and execution. But working with SLAM in practice turned out to be difficult. The existing SLAM algorithms are mostly designed for slow robots and stop working at higher speeds. This makes them unsuitable for racing. But seeing the map generate looks pretty cool! Wallfollowing Wallfollowing is what we call our greedy driving algorithm. It has no concept of the track or the ideal path. It only considers the part of the track that the LIDAR can see from the current position of the car. Our approach is to separate the laser sample points (shown in red) into a left and a right wall and fit two circles into them. The algorithm calculates a predicted position of the car (in the center of the video) and a desired position (in the center of the track). The difference between them (shown in orange) is used as the error for a PID controller, which controlls the steering. To control throttle, we calculate multiple maximum speeds based on the PID error, the curviness of the track and the distance towards the next obstacle (shown in yellow). Out of these maximum speeds, we apply the smallest one. This allows us to slow down in curves, cut corners, etc. Teams competing in the F1/10 competition typically use an approach like this. The wallfollowing approach provided the best lap times in our tests. The videos for Gazebo and for SLAM above show the car being controlled by the Wallfollowing algorithm. Here is a video of the physical car driving with it: Reinforcement Learning One of the methods of autonous driving we tried during this project group was reinforcement learning. It is an area of machine learning where an agent, in our case a car, learns to select actions given a state to optimize a reward function. In particular, we used deep reinforcement learning, where the function that selects an action given a state is a neural network, instead of a lookup table. In our case, the state vector was a downscaled version of the Lidar scan. That means, each element of the vector contains the measured distance for a fixed direction. We also experimented with other state information, such as velocity and throttle, but this didn't bring any improvement. In reinforcement learning, the actions are discrete, meaning that the agent selects an action out of a finite set, instead of providing numbers for throttle and steering. In the simplest example, we used a fixed velocity and two actions for steering left and right. Straight driving would be achieved by oscilating between left and right. We also tried more granular action sets, but this increases the difficulty of the learning task. The neural network has one input neuron for each element of the state vector and one output neuron for each action. It predicts a Q value for each action and the car will perform the action with the highest Q value. During training however, sometimes random actions are taken instead, according an epsilon-greedy policy (exploration vs. exploitation). For the reward function, it would be possible to use a reward of 1 for each step. There is an implicit reward for not crashing as this leads to longer episodes and the reward is accumulated over the episode. But it will learn to stay in the center faster if we give it higher rewards for driving close to the center of the track and lower rewards for driving close to the walls. In addition, we reward high speed. We tested Q-Learning and Policy Gradient. Both took hours to learn to drive in the center, but eventually did so decently. Overall, policy gradient worked better than Q-learning. But both were unreliable in the simulation and didn't translate well from simulation to reality. I think that we could achieve significantly better results if we had more time. This is a video of the car driving in our simulation with policy gradient: Evolutionary Algorithm For the evolutionary algorithm, we took the neural networks from our reinforcement learning approach and learned the network parameters with an evolutionary algorithm. Instead of using a set of actions, we let two output neurons control throttle and steering directly. We started with random weights and randomly changed them while measuring the performance of the car in the simulation. Surprisingly, this worked about as well as reinforcement learning, but was easier to implement and took less time to train. However, like reinforcement learning, it did not perform as well as our wallfollowing algorithm. Outlook Since our goal was not primarily to participate in the official competition, we investigated sevaral methods of autonomous driving. The least fancy one, our greedy wallfollowing algorithm, turned out to be the fastest. All of the driving methods could probably be improved with more time. There is a follow-up project group that continues to work with our code base. It looks like they are even working on moving away from Gazebo.
Click here for the interactive visualization. The Natural History Museum in London has a data portal in which they provide digital records for many of their specimens. Some of these records have images. I recently learned how to use machine learning tools such as convolutional neural networks and I wanted to use the NHM data to see what can be done with these tools. The dataset of butterflies seemed particularly interesting to me because the images are visually interesting, yet they are all similar in that they all contain a butterfly in a canonical pose. Since I'm only interested in learning the visuals of the butterflies and not the rest of the photographs, such as the labels, I needed to find a way to separate the butterflies from the background. Here, you can see one collection of ~2000 butterflies, which are photographed against a clear background: It is quite easy to remove the background automatically based on contrast. However, besides this collection, there are a lot more images of butterflies available. They can be found by searching for "lepidoptera" in the data portal. This way, I downloaded 30 GB of butterflies. Most of the 150000 images have noisy backgrounds with varying brightness, such as this one: To separate the butterflies from the backgrounds, I trained a U-Net that predicts for each pixel of the image whether it belongs to the butterfly or background. It is a convolutional neural network that receives an image as the input and produces a new image as the output, in my case a background-foreground mask. I started by manually creating masks for a few images: I trained the U-Net and tested it on random images from the dataset. From those I selected the images where the classification went poorly and manually added masks for them to the training set. This selection process leads to a ground truth dataset that contains mostly "challenging" items. I ended up with ~300 masks, however it works reasonably well even with fewer images. Since the network needs to make a prediction for every single pixel, it is forced to generalize even when trained on a single image. Using the masks generated by the U-Net, I can remove the background and crop all the images in the dataset: Next, I created a 128✕128 image with a white background for each butterfly and trained an autoencoder on them. The autoencoder receives an images as the input and produces a reconstructed image as the output. Its loss function is the difference between the input and output so that during training, it learns to reduce that difference and creates an accurate reconstruction of the input. The autoencoder has a bottleneck in the middle which consists of 128 neurons. This way, it learns to compress each image into a latent vector of 128 numbers. Here are some examples of images reconstructed with the autoencoder: During 6 hours of training, the network didn't converge, meaning that a better quality could be achieved by training longer. However, for this project I'm mostly interested in the latent space learned by the autoencoder, not the reconstructed images. I used the encoder of the autoencoder to calculate the latent vectors for all images in the dataset. To visualize the latent space, I used t-SNE to transform the latent vectors from 128-dimensional space to a 2D plane. In this t-SNE plot, each dot corresponds to an image in the dataset. The t-SNE transformation places the points so that those points appear close to each other that have similar latent vectors, meaning that they appear similar to the autoencoder. Next, I wanted to replace the dots in the plot with images of the corresponding butterflies. To prevent overlapping of very close points, I iteratively moved them apart. This was done by defining a "safe" radius and finding all points that have a neighbor within this radius. Each point is then moved away from its closest neighbor by a small distance: Here is a part of the resulting image, where I also added some drop shadows: We can already see that similar butterflies are clustered together. However, this t-SNE plot is 131000✕131000 pixels big. That is 17 Gigapixels! To display it, I'm using Leaflet, a javascript library that can render maps. The image is split into tiles with a resolution of 256x256 pixels which are loaded dynamically by Leaflet. I also created tiles for lower zoom levels, so that people can zoom out like on a map. When the map is zoomed all the way out, the images appear as tiny dots. For the intermediate zoom levels, I rendered a few "representative" images. These are selected by applying a k-means clustering to the t-SNE points. For each cluster, a representative image is rendered at the cluster center. The representative image is selected by calculating the average of the cluster in latent space (not t-SNE space) and finding the butterfly with the latent vector that is closest to the average. I found that when using t-SNE space to determine the representative, an outlier is often selected, which doesn't happen in latent space. The tileset contains 138000 images, which is ~900 MB. This is what it looks like to zoom around in the interactive t-SNE plot: Click here for the interactive version of the t-SNE plot. I also added the ability to click on the butterflies to show a popup with the original image, the scientific name and some other data. These information are provided in the NHM data portal as a CSV file. I wrote a script that creates a JSON file with the t-SNE positions of the points and the corresponding data. But this metadata file alone is 50MB in size. It is not feasible to load it when someone visits the website. So I split the metadata into smaller chunks using a quadtree, much like the images are split into tiles. Now the metadata is loaded only for the regions that the user looks at. Using the data from these CSV files, I can color the dots in the t-SNE plot according to the genus of the butterflies: We can see that the autoencoder has learned to separate the butterflies by genus, solely based on the visuals, without being provided any information about the genus! This is an example of unsupervised training of a classifier. Similarly, it mostly placed the families in adjacent clusters: There is an area in the plot where lots of colors mix. These are specimens with missing wings. Since the missing wing is the most striking visual feature, it primarily determines the latent vector that the autoencoder assigns. Using the same method for the sex, we can see that most of the records contain no sex information. But for the birdwings, for which this information is available, the autoencoder has learned to separate male from female, again, without being tasked to do so. We can observe a similar result for the delias. The source code for this project is available on Github. The images of the butterflies are provided by the Trustees of the Natural History Museum under a CC BY 4.0 license.
More in creative
Are more famous figures dying, and if so, why?
This week I’m in The UnPopulist with an article about the politics of the abundance agenda:
Nothing stays still. Relative to the rest of the world, even something that’s not moving is changing. It’s tempting to talk about not making fast enough progress. But it’s far more useful to ask which direction we’re progressing. Often, people will point to the velocity of the change they’re making without pausing to consider the […]
Censorship, platforms, and exactly who are we empowering?
If you fail to show us the facts, it’s difficult to accept your analysis. While it’s tempting to simply share an interpretation of what’s happening, credibility and persuasion are based on showing your work.