A million triangles

Sixth Katana post, and by far the one I spent the most time on. The goal: render the filament in 3D. Fat tubes following every extrusion move, the way good slicers like Bambu Studio or PrusaSlicer do. Spoiler - it’s hard.

A single layer of a mid-sized model can easily have a few thousand segments, split between perimeters and infill. A whole print can have well into the millions. If I naively turn each segment into a mesh of a cylinder, that’s tens of millions of triangles per frame just for the filament. This doesn’t work. I figured out most of what follows by building the naive version first and watching it explode.

First attempt: just draw the cylinders

The dumb version: for every extrusion move, generate a mesh of a cylinder along that segment (32 vertices around the circumference + 2 end caps), stuff all the vertices into one big buffer, upload it to the GPU, draw it. This didn’t even look pretty for very small models, because the corners of the cylinders are not curved. Because of this, I also added a “ball” geometry between the cylinders to fake those curves, which compounded the GPU load a lot.

The liver is impossible to preview at this time btw. I added an FPS counter and it’s actually below 1, into the “seconds per frame” territory.

Instanced rendering

First obvious improvement - instanced rendering. We just need to upload 1 cylinder mesh (and 1 sphere mesh), then for each segment, upload one struct per instance, containing it’s parameters (start, direction, length, radius, color). Then we issue a single draw call that tells the GPU “draw this prototype N times, and use the per-instance data to transform each copy into position.”

In the vertex shader, each instance rotates and scales the prototype into world space:

// Per-instance attributes
layout (location = 2) in vec3 a_inst_start;
layout (location = 3) in vec2 a_inst_dir;    // unit direction in XY
layout (location = 4) in vec2 a_inst_scale;  // (length, radius)
layout (location = 5) in vec4 a_inst_color;

void main() {
    float seg_len = a_inst_scale.x;
    float radius  = a_inst_scale.y;

    vec3 tangent   = vec3(a_inst_dir, 0.0);
    vec3 bitangent = vec3(-a_inst_dir.y, a_inst_dir.x, 0.0);
    vec3 up        = vec3(0.0, 0.0, 1.0);

    vec3 scaled = vec3(a_pos.x * seg_len,
                       a_pos.y * radius * 2.0,
                       a_pos.z * radius * 2.0);
    scaled.x += seg_len * 0.5;

    vec3 world_pos = a_inst_start
        + tangent   * scaled.x
        + bitangent * scaled.y
        + up        * scaled.z;
    // ...
}

Only this already made the framerate improve drastically, still, that’s not enough.

Turning impostor syndrome into impostor billboard

One other issue of the program at this point was memory usage. The sheer amount of geometries rendered made it freeze solid due to memory consumption. I tried simplifying the geometry at first, but it became very ugly, then I tried this trick called Impostor Billboard Rendering.

This is the gist: for each segment, emit a single quad (4 vertices, no prototype mesh) and orient it towards the camera. In the fragment shader, figure out where the ray through this pixel would hit the implied cylinder (most of the complexity goes in this math), shade it as if the cylinder were really there, and write the correct depth into gl_FragDepth so the impostor intersects with neighboring geometry correctly.

impostor billboard example

Impostors made the preview run smooth as butter! But it still didn’t look right.

A few days later I took the approach down, because it wasn’t looking right, you can see the “billboards” flickering through each other depending on how you rotate the camera (actual cylinders don’t have point-depth). At the same time I realized tubes aren’t even really the right form! I actually zoomed really close in Bambu Studio, trying to figure out how the hell they can display the liver so effortlessly, and saw it wasn’t a cylinder. They show flattened rhombuses all the way, so I did it too. The performance increase wasn’t as large as the billboard trick, but at least it was looking proper.

Frustum culling

Another thing I tried implementing was Frustum Culling, basically only render what we see. Remember when I talked a few posts back about the “clip space”? This is where it makes sense, because once the vertexes are projected into the clip space, we just need to apply some math to see if it should be rendered or not. A similar idea was used here, to avoid loading some layers to the GPU if their bounding boxes are out of the view frustum.

It was a nice learning experience, but didn’t really get me a lot of FPS, because we’re usually seeing the whole thing when we’re looking at a 3D print preview, so it’s only significant in some edge scenarios.

Where we are

Six posts in, Katana can slice an STL, generate perimeters and infill, plan the toolpath, and render the filament in a fluid-ish 3D viewer. It’s still not usable, there’s a lot of necessary stuff missing:

Supports (definitely a fun one)
G-Code generation
More types of infill patterns (and compose the code better so they’re interchangeable)
Horizontal slider for displaying the segment order within a layer
…A bunch of other 3D printing features like retraction, seam placement, printing speed and flow, skirt loops, brim, etc.

And this is only for a normal/simple slicer, I’m not even delving into the world of non-planar slicing, ironing, multi-color, anti-aliasing, etc. At least not yet :D

But my next step is actually a big refactor to the katana-viewer. I really got into the rendering shenanigans and I want to implement a proper webgpu renderer to have more control over the GPU pipeline, and learn more in the process (and maybe give the impostor billboard a second chance, properly this time).

So the posts will end here for now, if I feel like continuing the series later, I’ll do it!

Thanks for reading! Hit me up if you wanna talk about it.