YabaSanshiro's VDP1 rendering gets even more accurate — reproducing SEGA Saturn drawing with Compute Shaders

Featured image for article: YabaSanshiro's VDP1 rendering gets even more accurate — reproducing SEGA Saturn drawing with Compute Shaders

YabaSanshiro is an emulator that actively leverages the GPU so that SEGA Saturn games run comfortably on modern hardware.

The goal is not just to "make it run", but to keep the feel of the real hardware while delivering high resolution and fast rendering at the same time. The Vulkan backend in particular has been taking advantage of the parallel processing that modern GPUs are good at, so that VDP1 drawing can be processed quickly.

But there has been a long-standing problem here.

Long-standing issues

The SEGA Saturn's VDP1 was designed around drawing sprites and polygons as quads (four-sided primitives).

Modern GPUs, on the other hand, are fundamentally designed to draw triangles. Vertex shaders, fragment shaders, tessellation — all of these are built around processing triangle polygons efficiently.

This difference in design philosophy caused three main issues.

1. Distorted textures

VDP1 sprites are quads. But when you render a quad on a modern GPU, it is normally split into two triangles.

If you draw a sprite this way, a texture that should look natural across the whole quad can appear distorted across the seam between the two triangles.

VDP1 assumes textures are mapped onto quads. The standard rendering pipeline of a modern GPU, however, assumes interpolation per triangle. As a result, when you replace VDP1's distorted sprites with triangles directly, the way the texture looks no longer matches the real hardware.

Until now, YabaSanshiro mitigated this by tessellating the quad into many smaller pieces — the smaller each triangle, the smaller the per-triangle distortion.

This is not a fundamental fix, though. As long as the quad is approximated with triangles, you are not actually reproducing VDP1's quad rasterization itself.

left_bottom

2. Right and bottom edges of the quad

Another big issue is how the right and bottom edges of a quad are handled.

On the SEGA Saturn, the right and bottom edges of the coordinate range are also part of the drawn region. For example, a sprite with its top-left at (0, 0) and bottom-right at (7, 7) is drawn as 8 pixels wide and 8 pixels tall.

In other words, the rightmost column at x = 7 and the bottom row at y = 7 are also drawn.

Modern GPU rasterization, by contrast, follows a rule where the right and bottom edges are not part of the drawn region — so adjacent polygons don't double-cover the same pixel. That's a sensible design for a modern GPU.

However, when you try to reproduce VDP1's quad rasterization, this difference becomes a problem. The rightmost column and bottommost row are missing by one pixel, and thin gaps appear between adjacent polygons.

So far, we worked around this by extending coordinates by +1 — for example, expanding the drawn region by one pixel toward the bottom-right to approximate VDP1's "include the right/bottom edges" behavior.

This correction is not perfect either. In scenes with many small connected polygons (such as a flag), gaps still appear at the polygon boundaries.

left_bottom

3. Coordinate precision issues that show up at high resolution

The third issue is a coordinate precision problem that becomes visible once you start upscaling.

At the SEGA Saturn's native resolution, tiny coordinate offsets are barely noticeable. The original screen resolution is low, so single-pixel roughness blends naturally into the overall image.

But when an emulator upscales the output by 2x or 4x, offsets that used to be invisible become very visible.

The ground in Sega Rally was a particularly bad case. Thin gaps and unnatural lines appeared along the borders of the polygons that make up the road surface.

left_bottom

This is not just a side effect of bumping up the rendering resolution. VDP1's coordinate interpretation and modern GPU triangle rasterization differ subtly, and what was hidden at low resolution gets exposed once you scale up.

To display nicely at high resolution, simply increasing the resolution is not enough. We needed to reproduce VDP1's "which pixels should be filled" rule itself, more accurately.

A new solution using Compute Shaders

This update introduces a new rendering approach using compute shaders, in order to reproduce VDP1 drawing more faithfully to the real hardware.

We initially considered an approach that starts from each point of the texture and computes where on screen it should be drawn. This is closer to how VDP1 works conceptually, and it makes it easy to handle the right/bottom-inclusive rule.

However, on Android devices this approach did not deliver enough performance. So the current implementation starts from the screen-side pixels instead, and computes the corresponding VDP1 coordinate and texel inside the compute shader.

Standard GPU rendering vs. Compute Shader rendering

The previous Vulkan path split VDP1 quads into triangles for the modern GPU. The new approach does not rely on triangle rasterization. Instead, the compute shader decides per-pixel: "is this pixel inside VDP1's drawn region?" and "if so, which texel does it correspond to?"

In other words, the current method is not a forward mapping from texture to screen. It is a reverse mapping from the screen back to VDP1's quad.

The key point is that even though the entry point is screen pixels, the rules used for the test follow VDP1, not the modern GPU's triangle rasterization. We explicitly choose which pixels to fill in the compute shader, matching VDP1's right/bottom-inclusive coordinate interpretation.

For example, an 8-pixel-wide sprite is treated as 8 cells on screen. Instead of relying on tricks like vertex +1 adjustments or expanding only the bottom-right, we approximate VDP1's rules with a coverage test on the compute shader side.

It also handles upscaling.

At native resolution, one texel maps to one pixel. At HD resolution — say, 4x — one Saturn pixel corresponds to a 4x4 block of HD pixels.

The new compute shader figures out the Saturn coordinate for each HD pixel, and tests whether that coordinate is inside the VDP1 drawing cell. This way, we keep the real hardware's rasterization rules while still avoiding gaps even at high resolution.

What the compute shader does

The new path runs a compute shader for each VDP1 command.

In the current screen-side reverse-mapping approach, each thread is responsible for a screen pixel — or, when upscaled, an output HD pixel.

The flow looks roughly like this.

What the Compute Shader does

  1. Read the four vertices of the sprite or polygon from the VDP1 command.
  2. Test whether the current screen pixel falls inside that VDP1 command's drawing region.
  3. If it does, compute the corresponding texture coordinate from the screen coordinate.
  4. Apply texture flipping, color mode, Gouraud shading, half-transparency, mesh, and so on.
  5. Write the final color to the framebuffer.

The previous triangle-based path required a lot of corrections to match the SEGA Saturn's specification.

The new approach steps away from the fixed rules of triangle rasterization, and instead controls coverage testing and texture lookup for VDP1 inside the compute shader. It is not a direct forward mapping from texture to screen, but it is a practical method that stays performant on Android while getting closer to VDP1's rasterization rules.

Rendering quality improvements

In places where tessellation was too coarse, textures used to come out distorted. With this method, smooth rendering is achieved through simple per-pixel logic.

Decisions like "how should we extend the bottom-right coordinate?" — which were tricky to handle in a vertex-based approach — are now solved with simple logic.

Things that should be lines on the real hardware can become trapezoids at high resolution. Special handling is added so that they continue to draw as lines. Logic that was hard to express on a vertex basis becomes straightforward inside the compute shader.

Performance

We're not just chasing accuracy with compute shaders — we care about performance too.

The previous Vulkan path translated VDP1 drawing into triangle rasterization on the GPU. That was fast, but it required corrections to compensate for the differences against the SEGA Saturn's native drawing rules.

The new approach runs VDP1's coverage test directly inside a compute shader.

The texture-side forward-mapping approach we initially explored is conceptually closer to how VDP1 works, but it was not fast enough on Android. Because the work starts from texels and writes into the screen, overlapping draws and the increased write volume at high resolution make it hard to keep a usable frame rate on some devices.

For that reason, the current implementation starts from screen pixels. With this approach, the per-output-pixel coverage test and texture lookup keep the load manageable on Android.

Even with this much extra work, the implementation stays at 60fps.

The graph below shows results aggregated per second on real hardware (AYN Thor / Snapdragon 8 Gen 2, Adreno 740) running the same scene.

FPS by VDP1 rendering method

Rendering method Avg fps Min fps Max fps Holds 60fps
Compute Shader (new / tile_shade) ~78 63 112 Yes
Tessellation (previous / graphics pipeline) ~180 152 234 Yes

The previous tessellation method is still faster in raw rendering speed — in some scenes by more than 2x. That's because modern GPUs are heavily optimized for triangle rasterization, and the gap shows up especially at high resolution.

The compute shader method, on the other hand, performs the coverage test and texture lookup itself per pixel, so its relative cost is higher. However, it never drops below 60fps even in the most demanding scenes — more than enough headroom for actual gameplay.

The main goal of the new approach is to balance accuracy (no texture distortion, no gaps, correct one-pixel boundaries) with practical performance on Android (stable 60fps).

Summary

This VDP1 compute shader work is a major step forward in rendering quality for YabaSanshiro.

The previous approach required tricks like vertex corrections and expanded drawing regions in order to bend modern triangle rendering toward VDP1's specification.

The new approach uses a compute shader to perform VDP1-style coverage testing and texture lookup explicitly.

As a result, three things come together:

  • More accurate rendering, closer to real SEGA Saturn hardware
  • Stable, gap-free output even at high resolution
  • Practical performance that holds up on Android

YabaSanshiro will keep leveraging modern GPU features, and continue working toward an environment where SEGA Saturn games can be enjoyed even more beautifully, more accurately, and more comfortably.