Part 4 | Frontend Hardware-Accelerated Decoding and Ultra-Low-Latency Rendering with WebCodecs & OffscreenCanvas
The desktop helper interface of the Easy Connect Suite runs inside Wails, using a frontend web view built with Vue 3 and TypeScript. Playing a real-time H.264 video stream in a browser client introduces two major challenges:
- Decoding Overhead: Software decoding in JavaScript or WebAssembly (e.g. using
ffmpeg.js) is CPU-intensive. When processing 2K or 4K streams, garbage collection overhead and thread synchronization lag can drop frames and heat up the CPU. - Buffering Latency: Native HTML5
<video>players and Media Source Extensions (MSE) buffer video segments to ensure smooth playback, adding 1 to 2 seconds of latency. This is too slow for real-time remote control.
To achieve sub-150ms latency, we implement a dual-track hardware rendering pipeline (useFramePipeline.ts) utilizing Media Source Extensions (MSE) as the primary playback engine, with automatic fallback to WebCodecs API and OffscreenCanvas rendering, unlocking the browser's full potential for ultra-low-latency display.
1. Primary Path: Media Source Extensions (MSE) Hardware-Accelerated Playback
For standard remote support sessions, we prioritize the MSE pipeline to leverage native browser video hardware acceleration across various operating systems:
[ WebSocket / Wails Backend ]
│
▼ (Packages frames into fMP4 segments on-the-fly)
[ fMP4 Container Segments ]
│
▼ (appended via appendBuffer)
[ Browser Native SourceBuffer ]
│
▼ (HTML5 <video> Element)
[ GPU Hardware Decoding & Rendering ] ──► (Low-latency correction via auto-seek on timeupdate)1.1 Live fMP4 Remuxing and Injection
- fMP4 Packaging: The Wails backend packages raw H.264 frames received from the controlled client into fragmented MP4 (fMP4) container segments.
- Native Playback: The frontend initializes a native
MediaSourceobject bound to an HTML5<video>element. As fMP4 segments arrive over the WebSocket, they are appended to theSourceBufferviaappendBuffer(...), passing video decoding and compositing directly to the OS media subsystem.
1.2 Low-Latency Buffer Correction
To bypass the default buffering behavior of native HTML5 players, a listener monitors the video's timeupdate event. If the playback position falls behind the end of the buffered ranges (bufferedEnd) by more than 2.0 seconds, it triggers an in-place seek forward:
video.currentTime = bufferedEnd - 0.2; // Force playback to catch up to the latest frames1.3 Automatic Fallback
If the MSE pipeline receives frames wrapped in the custom compact_binary_v1 envelope instead of fMP4 containers, or if the browser fails to negotiate codec configurations, it detects the signature mismatch. The client then automatically tears down the MSE player and falls back to the WebCodecs rendering pipeline, ensuring compatibility.
2. Fallback Path Core: Web Workers and Transferable Objects
When the client falls back to WebCodecs, to keep the browser UI thread responsive for window events and input routing, we move video processing and decoding to a background thread (decoder.worker.ts):
[ Wails / WebSocket Main Thread ]
│
▼ (Transferable Object ArrayBuffer, zero copy)
[ Web Worker Thread ]
│
├─► 1. Parse Envelope & Convert Annex-B to AVCC
├─► 2. WebCodecs (VideoDecoder) Hardware Decoding
├─► 3. Draw directly to OffscreenCanvas
│
▼ (Rendered directly in GPU)
[ Screen <canvas> Element ]- Zero-Copy Frame Transfer: The main thread passes incoming video packets (
ArrayBuffer) to the worker viapostMessage(data, [data]). The array argument transfers memory ownership directly, avoiding expensive copy operations and garbage collection overhead. - Offscreen Canvas Transfer: During initialization, the main thread transfers control of the UI
<canvas>element usingcanvas.transferControlToOffscreen(). This allows the worker to submit draw calls directly to the GPU.
3. Fallback Decoding: WebCodecs API Integration
Inside the worker, the WebCodecs VideoDecoder interfaces directly with the system's hardware decoder.
3.1 Annex-B to AVCC Format Conversion
Video streams generally use the Annex-B format (frames separated by 0x00000001 or 0x000001 start codes). The WebCodecs VideoDecoder requires the input payload to be formatted in AVCC format (prefixed with a 4-byte frame length, initialized with an avcC byte block containing SPS/PPS metadata).
To handle this, we implement a fast scanner:
- Length Re-writing: Locates Annex-B start codes and writes the big-endian segment length in-place within a single memory buffer.
- Metadata Parsing: Extracts SPS/PPS bytes from the first keyframe to build the AVCC configuration block, initializing the decoder via
decoder.configure(...).
3.2 Hardware Decoding
The VideoDecoder submits AVCC frames to the system's hardware decoder (such as DXVA on Windows, VideoToolbox on macOS, or VA-API on Linux). The GPU returns a raw VideoFrame object representing the decoded texture, avoiding CPU-side pixel copies.
4. GPU Rendering: Direct Drawing via OffscreenCanvas
Sending decoded frames back to the main thread for rendering would introduce message-passing overhead.
Since the worker controls the canvas, it draws the VideoFrame directly to the OffscreenCanvasRenderingContext2D:
// Render the VideoFrame to the canvas via GPU copy
ctx.drawImage(videoFrame, 0, 0, canvas.width, canvas.height);
// Release the frame immediately to prevent memory leaks
videoFrame.close();The browser handles scaling and rendering on the GPU, keeping the main JS thread free for Wails window actions and input routing.
5. Backpressure and Queue Management
When network bandwidth drops or decoding slows down, frames can stack up in the decode queue, causing input lag. We implement a backpressure algorithm inside the worker to manage this:
┌───────────────────────────────┐
│ Monitor decodeQueueSize │
└───────────────┬───────────────┘
│
┌──────────────────────────────┼──────────────────────────────┐
▼ (Size > Q_SOFT 8~10) ▼ (Size > Q_HARD 16~18) ▼ (Latency > 1000ms)
[ Frame Discard Policy ] [ Decoder Reset Policy ] [ Catch-Up Policy ]
Discard P-frames until Reset VideoDecoder, Clear queue buffers,
next keyframe (IDR). request keyframe from sender. decode next keyframe.- Soft Limit (Q_SOFT = 8 or 10): If the decode queue size exceeds this threshold, the worker discards incoming P-frames until it receives the next H.264 IDR keyframe, preventing display lag.
- Hard Limit (Q_HARD = 16 or 18): If the queue grows past this limit, the worker resets the hardware decoder via
decoder.reset()and requests an immediate sync frame from the sender. - Live Catch-up Mode: If the difference between the frame timestamp and the local clock exceeds 1000ms, the client discards the queue and decodes the next keyframe to catch up.
Conclusion
By prioritizing MSE hardware-accelerated playback and using WebCodecs + OffscreenCanvas as a fallback path, the remote control client achieves wide compatibility, high frame rates, and sub-150ms screen updates in the browser. In the next and final part of this series, we will examine adaptive transmission optimizations and telemetry-driven bitrate/FPS tuning.
