for most intents and purposes

Rendering à la Umbral's Shader DSL

Background

If you've done any game or graphics programming, you're most likely familiar with the concept of shaders. Shaders are small programs that run on the GPU, responsible for determining how geometry and pixels get drawn to the screen. A vertex shader runs once per vertex and determines where it ends up in clip space; a fragment shader runs once per pixel and determines its final color. Together, they form a pipeline that takes your geometry and turns it into the image you see.

Typically, shaders are written in a different language than the rest of your game. Your game code could be in C++, but you might be writing your shaders in HLSL, GLSL, WGSL, or some other SL. The main annoyance with this (other than having a bunch of shader files floating around your project and build system) is code duplication. You'll often have to define very similar types in both your CPU and GPU code because there's generally no way to share definitions between the two (without some powerful compile-time source-ry).

Slang actually helps with this a bit; you can export type definitions from your shader code to your C/C++ code. That's cool, but you still have to learn yet another shading language. To solve this, I decided to just have you write umbral for your shader code too.

Prior Art

Now, if you watched Henry Rose's "Game Without an Engine" series, you might have heard of his C compiler, hcc. The primary selling point of this compiler is that you can compile C code to SPIR-V, which is sick! Seeing this, I knew this is exactly the paradigm I wanted for umbral.

Besides the obvious benefit of not learning another shading language, your CPU and GPU code can share the exact same types and layouts so you don't have to worry about keeping them in sync! I went even a step further and allowed the addition of shader helper functions and CPU CPU-callable helpers defined on shader types.

The Syntax

Before getting into how it works, let's look at what it looks like. Here's the triangle shader from lunar examples:

@shader_pod const TriVsOut := struct {
  @builtin(position) clip: math::vec4,
};

@shader_pod const TriFsOut := struct {
  @location(0) color: math::vec4,
};

@shader const TriShader := struct {
  @vs_out vout: TriVsOut,
  @fs_in  fin:  TriVsOut,
  @fs_out fout: TriFsOut,
};

impl TriShader {
  @stage(vertex)
  const vert := fn(&mut self) -> void {
    const vid := @vertex_id();
    var x: f32 = 0.0;
    var y: f32 = 0.0;
    if (vid == 0) {
      x =  0.0; y = -0.5;
    } else if (vid == 1) {
      x = -0.5; y =  0.5;
    } else {
      x =  0.5; y =  0.5;
    }
    self.vout.clip = math::vec4(x, y, 0.0, 1.0);
  };

  @stage(fragment)
  const frag := fn(&mut self) -> void {
    self.fout.color = math::vec4(1.0, 0.5, 0.0, 1.0);
  };
}

There are two categories of shader types: @shader_pod and @shader. A @shader_pod is a plain-old-data struct that represents data flowing between pipeline stages: vertex inputs, vertex outputs, fragment inputs, and fragment outputs. Fields on a @shader_pod are annotated with @location(n) to specify their SPIR-V location, or @builtin(position) to mark the vertex position output.

A @shader struct is the actual shader descriptor. Its fields are annotated with @vs_in, @vs_out, @fs_in, and @fs_out to wire up the @shader_pod types to each stage. Shader stages are then implemented as methods on the @shader struct, marked with @stage(vertex) or @stage(fragment). You read inputs and write outputs through self, just like any other method.

There are also @shader_fn methods: helper functions you can call from within stage methods, useful for sharing logic between stages. And a handful of shader intrinsics cover the things you'd expect: @vertex_id(), @draw_id(), @texture2d(index), @sampler(index), @sample(tex, samp, uv), and @frame_read<T>(offset) for reading typed data out of a GPU-side memory arena.

The How

Henry's hcc features a hand-rolled SPIR-V emission backend. I did not have the same confidence and wanted to bolster my LLVM skills, so I instead leveraged MLIR's SPIR-V dialect, which gives you built-in validation and serialization for free.

The compilation pipeline has five phases: AST to a custom MLIR dialect, lowering to the SPIR-V dialect, serialization to a binary, reflection collection, and finally packing everything into a .umsh file.

AST to MLIR

The first phase lowers the shader's semantic IR into a custom MLIR dialect called um.shader. Rather than immediately targeting SPIR-V, this dialect captures high-level shader concepts: um_shader.load_input, um_shader.store_output, um_shader.texture2d, um_shader.sample, um_shader.frame_read, and so on. This keeps the translation from the AST simple and lets the SPIR-V lowering handle the wackier structural requirements of SPIR-V.

IO field accesses on self are detected and converted to explicit load/store ops. So self.vout.clip = ... becomes a um_shader.store_output "vout", "clip", <value> operation, and self.vin.pos becomes a um_shader.load_input "vin", "pos". Mutable variables (var x: f32) are lowered to memref.alloca with Function storage class, and all allocas are hoisted to the function entry block, which is a SPIR-V requirement. Vector construction from scalars goes through a sequence of vector.insert ops. Nothing too surprising.

MLIR to SPIR-V

The second phase lowers the um.shader dialect to MLIR's built-in spirv dialect. One spirv.module is emitted per shader stage. Each IO field becomes a spirv.GlobalVariable with the appropriate storage class (Input or Output) and its location or builtin decoration. Bindless texture and sampler arrays become UniformConstant runtime arrays. Frame memory and draw packet buffers become SSBOs.

Helper functions (@shader_fn) are cloned into every stage module that calls them, since a spirv.module can only have one definition of each function.

The Vulkan target is SPIR-V 1.3 with capabilities for bindless descriptors (RuntimeDescriptorArray, SampledImageArrayDynamicIndexing), draw parameters, and matrix operations.

Serialization and the Sampler Problem

This is where things get a little interesting. MLIR's SPIR-V dialect is missing support for OpTypeSampler and OpSampledImage. (Here's the relevant serialization code if you want to follow along.) These are the ops you need to combine a texture and a sampler before you can sample from them in Vulkan. Without them, you simply cannot implement texture sampling through the standard MLIR serialization path.

The workaround is a post-processing pass on the serialized binary. Before serialization, OpSampledImage ops are replaced with spirv.CompositeConstruct ops (which happen to serialize to the same number of words). Sampler variables are typed as i32 placeholders. After standard serialization, the binary is walked and patched: OpCompositeConstruct instructions with a SampledImage result type get their opcode rewritten to OpSampledImage, and the sampler type chain (OpTypeInt 32 in the sampler descriptor array) gets replaced with OpTypeSampler (opcode 26). The id_bound in the SPIR-V header is updated accordingly.

It is not pretty, but it works, and it is fully validated by the Vulkan runtime. I am also in the process of contributing proper OpTypeSampler support upstream to LLVM: llvm-project#189891. If that lands, we can get rid of all this silly binary patching.

The serialization pass also handles two decoration fixups that the MLIR dialect does not emit automatically: integer fragment inputs need a Flat decoration (required by Vulkan to disable interpolation), and SSBO struct members need NonWritable decorations when the fragmentStoresAndAtomics capability is not enabled.

Reflection and Packing

After serialization, a reflection pass walks the @vs_in shader pod type and collects the vertex attribute layout: stride, input rate, and per-attribute location, Vulkan format, and byte offset. This lets the runtime set up the vertex input state without you having to describe it separately.

Everything gets packed into a .umsh file: a simple binary format with a magic number, a section table, a STAGES section containing the serialized SPIR-V for each stage, and an optional REFL section with the vertex attribute layout. The runtime loads this file, hands the SPIR-V to Vulkan, and uses the reflection data to configure the pipeline automatically.

On the umbral side, referencing a shader in CPU code is done with the @shader_ref(TypeName) intrinsic, which resolves at compile time to the FNV-1a hash of "TypeName.umsh", the same hash the runtime uses to look up the shader asset.

Putting It All Together: the Cube Example

To make this concrete, let's walk through the cube example. It demonstrates a simple form of GPU-driven rendering: there is no vertex buffer. All geometry and per-draw data lives in a GPU-visible memory arena, written by the CPU and read by the shader directly via SSBOs.

On the CPU side, two structs describe the data layout:

const CubeVertex := struct {
  px: f32, py: f32, pz: f32,
  cr: f32, cg: f32, cb: f32, ca: f32,
};

const CubeDrawData := struct {
  mvp: math::mat4,
  verts: [36]CubeVertex,
};

CubeDrawData packs an MVP matrix (64 bytes) followed by all 36 vertices of the cube (28 bytes each: 3 floats position, 4 floats color). This is a plain umbral struct; no shader annotations, no special treatment. Every frame, the CPU allocates space for it in the frame arena and writes it directly:

const fa := gfx::frame_alloc(dev, @size_of(CubeDrawData), 256);
const data := @as(fa.ptr, &mut CubeDrawData);
*data = CubeDrawData { mvp = mvp, verts = verts };

gfx::frame_alloc returns a pointer into GPU-visible memory and a byte offset into the frame arena SSBO. That offset gets stored in a DrawPacket and pushed to the draw stream:

ds.push_draw_packet(types::DrawPacket {
  vertex_count     = 36,
  instance_count   = 1,
  draw_data_offset = fa.offset,
});

On the GPU side, the vertex shader reads it all back. @draw_packet(@draw_id()) fetches the draw packet for the current draw call, giving us the draw_data_offset. From there, the shader manually computes byte offsets and reads the data out of the frame arena:

@stage(vertex)
const vert := fn(&mut self) -> void {
  const pkt  := @draw_packet(@draw_id());
  const base := pkt.draw_data_offset;

  const mvp  := @frame_read<math::mat4>(base);
  const vid  := @vertex_id();
  const v_off := base + 64 + vid * 28;
  const pos  := @frame_read<math::vec3>(v_off);
  const color := @frame_read<math::vec4>(v_off + 12);

  self.vout.clip  = mvp * math::vec4(pos.x, pos.y, pos.z, 1.0);
  self.vout.color = color;
};

I am still experimenting with more ergonomic ways of calculating byte offsets within the shader. The current leading idea is having @frame_read consume the bytes and bump an offset into the buffer.

@frame_read<math::mat4>(base) reads 16 consecutive floats starting at base. Each vertex starts at base + 64 + vid * 28; the position is the first 12 bytes (vec3), and the color is the next 16 bytes (vec4) at offset + 12. The MVP multiply and output write are just regular umbral expressions.

The whole thing compiles down to a handful of SSBO AccessChain + Load ops in SPIR-V, a matrix-vector multiply, and two output stores. No vertex buffer binding, no input assembly, no descriptor set management on the CPU side beyond what the runtime handles automatically. You define the data layout as a normal struct, write it like normal memory, and read it back in the shader using the same type system.

Conclusion

Designing and implementing umbral's shader DSL has so far been my favorite part of the project. While the idea itself isn't completely unique, I think the utilization of MLIR's SPIR-V dialect in this way is at least a bit novel. I also enjoyed getting to apply new skills I learned from work to my personal project. Usually it's the other way around for me. I'd definitely like to explore other use cases of MLIR in this way (one idea I had was a DAW dialect for making music programmatically). LLVM once again proves it’s worth its weight in gold.

#compilers #gamedev #graphics #shaders #umbral