Graphics Debugging with an LLM
Intro
I use Claude Code while developing umbral to automate some of the more boilerplate-y work that comes with writing a compiler/game engine (CMake configs, big enums, Vulkan incantations, write docstrings, etc.). I also use it for troubleshooting issues I might encounter.
The problem with debugging graphics issues this way is that CC can't see your screen (it's not Windows-sponsored spyware, after all). The flow would be: CC attempts a fix, runs the app, and asks me what I saw. I'd reply "nothing," but obviously it's impossible for it to get any real feedback other than something is wrong.
Teaching It to See
I've used RenderDoc in the past to debug graphical issues, so I thought, "Hm, maybe it could use the RenderDoc CLI to 'see' the problem." Due to some configuration issues, I couldn't get it to drive the CLI. Instead, I made a debug output render target and serialized that to disk as a .PPM file. Once I did this, CC was able to parse the file and get a better idea of what was on the screen.
This worked well for the textured quad I was working on at the time. With a project with this many custom parts, it's genuinely hard to root-cause issues: is it our compiler emitting bad CPU or shader code, is there a bug in our standard library, did we implement the runtime hooks correctly, etc. CC could now see that nothing was being drawn. It got a non-alpha-blended quad rendered, then saw in the .PPM that there was no transparency in the pixels surrounding the texture, and corrected the alpha blending.
ASCII Art Debugging
Things got more interesting with the font renderer. I was using msdf-gen and parts of letters were being cut off. The implementation was logically correct, i.e. letters were being drawn on the screen, but it looked like someone took a bite out of them. I asked CC to troubleshoot.
It tried the same approach: parse the PPM, find the issues. But some letters were drawn correctly, so it thought all of them were. I said "no, look at the 'e', the horizontal line is missing." Then it did something interesting: it wrote a Python script that extracted subregions out of the PPM and dumped them to the console as ASCII art. It would define some subregion of pixels, then print the shape out using text characters. From here, it could distinctly see what parts of the letter were missing, experimented with different msdf-gen algorithms, and resolved the bug.
From Scripts to a Tool
Over the course of the font rendering work, this approach matured into a pattern. Each time CC needed to investigate a new issue, it would write a one-off Python snippet. I eventually told it to stop regenerating these and save them to files so it could reuse them.
The separate scripts eventually got consolidated into a single tool: gfx-dbg.py. I asked CC what features would help it debug graphics issues, and it designed a CLI tool covering everything it had needed:
info: quick triage. Is anything rendering? Where? What colors?ascii: dump a pixel region as ASCII art to "see" glyph shapespixel: sample specific coordinates to verify blending mathscan: find every unique color in a region with a histogramspv: extract SPIR-V from asset packs and runspirv-valdiff: compare two frame dumps pixel-by-pixel after a changehline/vline: trace RGB values across a row or column
The Feedback Loop
What made this work wasn't any single script. It was the feedback loop. CC would make a change, run the app with trace logging to capture the PPM, then interrogate the frame dump to see what actually happened. When the 'e' crossbar was missing, it could zoom into that glyph's region and see the gap in the ASCII art. When transparency broke after adding the quad renderer, scan showed that the text region contained only white and black (no atlas colors), pointing to the wrong texture being sampled. When the 'l' appeared to have serifs, it rendered the same glyph with PIL to confirm that LiberationMono actually has serifs. The rendering was correct.
Closing the Loop
The last step was getting the tool into our CLAUDE.md so future sessions don't have to rediscover it. I had CC write up usage instructions for gfx-dbg.py with the typical workflow: info first to check if anything rendered, ascii to visually inspect, pixel for exact values, scan to catch unexpected colors. Now every new conversation starts knowing the tool exists and how to use it.
At some point, I'd like to add a reference renderer so CC can do actual pixel diffs against known-good frames instead of just eyeballing ASCII art. That would make the diff command genuinely useful for catching regressions.
An LLM can't look at your screen (yet), but it can parse structured representations of what's on screen. A PPM file converted to ASCII art or sampled at specific coordinates gives it enough signal to reason about rendering bugs the same way you would with a screenshot. The tool just codifies the queries it found itself needing over and over.