Skip to content

Part 4 | Frontend Hardware-Accelerated Decoding and Ultra-Low-Latency Rendering with WebCodecs & OffscreenCanvas

The desktop helper interface of the Easy Connect Suite runs inside Wails, using a frontend web view built with Vue 3 and TypeScript. Playing a real-time H.264 video stream in a browser client introduces two major challenges:

  1. Decoding Overhead: Software decoding in JavaScript or WebAssembly (e.g. using ffmpeg.js) is CPU-intensive. When processing 2K or 4K streams, garbage collection overhead and thread synchronization lag can drop frames and heat up the CPU.
  2. Buffering Latency: Native HTML5 <video> players and Media Source Extensions (MSE) buffer video segments to ensure smooth playback, adding 1 to 2 seconds of latency. This is too slow for real-time remote control.

To achieve sub-150ms latency, we implement a dual-track hardware rendering pipeline (useFramePipeline.ts) utilizing Media Source Extensions (MSE) as the primary playback engine, with automatic fallback to WebCodecs API and OffscreenCanvas rendering, unlocking the browser's full potential for ultra-low-latency display.


1. Primary Path: Media Source Extensions (MSE) Hardware-Accelerated Playback

For standard remote support sessions, we prioritize the MSE pipeline to leverage native browser video hardware acceleration across various operating systems:

[ WebSocket / Wails Backend ]

            ▼ (Packages frames into fMP4 segments on-the-fly)
[ fMP4 Container Segments ]

            ▼ (appended via appendBuffer)
[ Browser Native SourceBuffer ]

            ▼ (HTML5 <video> Element)
[ GPU Hardware Decoding & Rendering ] ──► (Low-latency correction via auto-seek on timeupdate)

1.1 Live fMP4 Remuxing and Injection

  1. fMP4 Packaging: The Wails backend packages raw H.264 frames received from the controlled client into fragmented MP4 (fMP4) container segments.
  2. Native Playback: The frontend initializes a native MediaSource object bound to an HTML5 <video> element. As fMP4 segments arrive over the WebSocket, they are appended to the SourceBuffer via appendBuffer(...), passing video decoding and compositing directly to the OS media subsystem.

1.2 Low-Latency Buffer Correction

To bypass the default buffering behavior of native HTML5 players, a listener monitors the video's timeupdate event. If the playback position falls behind the end of the buffered ranges (bufferedEnd) by more than 2.0 seconds, it triggers an in-place seek forward:

typescript
video.currentTime = bufferedEnd - 0.2; // Force playback to catch up to the latest frames

1.3 Automatic Fallback

If the MSE pipeline receives frames wrapped in the custom compact_binary_v1 envelope instead of fMP4 containers, or if the browser fails to negotiate codec configurations, it detects the signature mismatch. The client then automatically tears down the MSE player and falls back to the WebCodecs rendering pipeline, ensuring compatibility.


2. Fallback Path Core: Web Workers and Transferable Objects

When the client falls back to WebCodecs, to keep the browser UI thread responsive for window events and input routing, we move video processing and decoding to a background thread (decoder.worker.ts):

[ Wails / WebSocket Main Thread ]

                ▼ (Transferable Object ArrayBuffer, zero copy)
      [ Web Worker Thread ]

                ├─► 1. Parse Envelope & Convert Annex-B to AVCC
                ├─► 2. WebCodecs (VideoDecoder) Hardware Decoding
                ├─► 3. Draw directly to OffscreenCanvas

                ▼ (Rendered directly in GPU)
    [ Screen <canvas> Element ]
  1. Zero-Copy Frame Transfer: The main thread passes incoming video packets (ArrayBuffer) to the worker via postMessage(data, [data]). The array argument transfers memory ownership directly, avoiding expensive copy operations and garbage collection overhead.
  2. Offscreen Canvas Transfer: During initialization, the main thread transfers control of the UI <canvas> element using canvas.transferControlToOffscreen(). This allows the worker to submit draw calls directly to the GPU.

3. Fallback Decoding: WebCodecs API Integration

Inside the worker, the WebCodecs VideoDecoder interfaces directly with the system's hardware decoder.

3.1 Annex-B to AVCC Format Conversion

Video streams generally use the Annex-B format (frames separated by 0x00000001 or 0x000001 start codes). The WebCodecs VideoDecoder requires the input payload to be formatted in AVCC format (prefixed with a 4-byte frame length, initialized with an avcC byte block containing SPS/PPS metadata).

To handle this, we implement a fast scanner:

  • Length Re-writing: Locates Annex-B start codes and writes the big-endian segment length in-place within a single memory buffer.
  • Metadata Parsing: Extracts SPS/PPS bytes from the first keyframe to build the AVCC configuration block, initializing the decoder via decoder.configure(...).

3.2 Hardware Decoding

The VideoDecoder submits AVCC frames to the system's hardware decoder (such as DXVA on Windows, VideoToolbox on macOS, or VA-API on Linux). The GPU returns a raw VideoFrame object representing the decoded texture, avoiding CPU-side pixel copies.


4. GPU Rendering: Direct Drawing via OffscreenCanvas

Sending decoded frames back to the main thread for rendering would introduce message-passing overhead.

Since the worker controls the canvas, it draws the VideoFrame directly to the OffscreenCanvasRenderingContext2D:

typescript
// Render the VideoFrame to the canvas via GPU copy
ctx.drawImage(videoFrame, 0, 0, canvas.width, canvas.height);
// Release the frame immediately to prevent memory leaks
videoFrame.close();

The browser handles scaling and rendering on the GPU, keeping the main JS thread free for Wails window actions and input routing.


5. Backpressure and Queue Management

When network bandwidth drops or decoding slows down, frames can stack up in the decode queue, causing input lag. We implement a backpressure algorithm inside the worker to manage this:

                       ┌───────────────────────────────┐
                       │    Monitor decodeQueueSize    │
                       └───────────────┬───────────────┘

        ┌──────────────────────────────┼──────────────────────────────┐
        ▼ (Size > Q_SOFT 8~10)         ▼ (Size > Q_HARD 16~18)        ▼ (Latency > 1000ms)
  [ Frame Discard Policy ]       [ Decoder Reset Policy ]       [ Catch-Up Policy ]
  Discard P-frames until         Reset VideoDecoder,            Clear queue buffers,
  next keyframe (IDR).           request keyframe from sender.  decode next keyframe.
  1. Soft Limit (Q_SOFT = 8 or 10): If the decode queue size exceeds this threshold, the worker discards incoming P-frames until it receives the next H.264 IDR keyframe, preventing display lag.
  2. Hard Limit (Q_HARD = 16 or 18): If the queue grows past this limit, the worker resets the hardware decoder via decoder.reset() and requests an immediate sync frame from the sender.
  3. Live Catch-up Mode: If the difference between the frame timestamp and the local clock exceeds 1000ms, the client discards the queue and decodes the next keyframe to catch up.

Conclusion

By prioritizing MSE hardware-accelerated playback and using WebCodecs + OffscreenCanvas as a fallback path, the remote control client achieves wide compatibility, high frame rates, and sub-150ms screen updates in the browser. In the next and final part of this series, we will examine adaptive transmission optimizations and telemetry-driven bitrate/FPS tuning.

Released under the MIT License. Terms | Privacy