Skip to content

Series Intro | Real-Time Screen Capture & Codec Optimization Architecture Overview

When building a modern remote assistance and screen-sharing system (such as Easy Connect SSH and Easy Link Assist in the Easy Connect system), ensuring ultra-low latency and maximum image quality under complex network conditions is the core technical benchmark.

To achieve sub-150ms end-to-end latency in public network environments, and to allow smooth interactions across multiple platforms including mobile phones, computers, and web browsers, the Easy Connect system rejects traditional CPU-bound screenshotting and slow software encoding methods. Instead, it implements a standardized, hardware-accelerated, zero-copy H.264/AVC real-time video streaming pipeline across all platforms.


1. Core Architecture Design

Throughout a remote assistance session, devices are classified into two core roles:

  • Controlled Role (Sender): Responsible for capturing local screen pixels, performing zero-copy encoding on the GPU, and packaging the stream using a compact binary protocol.
  • Controller Role (Receiver): Responsible for receiving and distributing stream data, using hardware decoder chips to decode in real-time, and rendering the frames with zero copy for high performance.

The two roles establish bi-directional, low-latency data communication over an underlying multiplexed (Yamux) secure tunnel. The overall pipeline architecture is designed as follows:

         Controlled Role (Sender)                             Controller Role (Receiver)
┌─────────────────────────────────────┐               ┌─────────────────────────────────────┐
│  GPU-Based Capture (SCK/DXGI/etc.)  │               │      Network Transport (Yamux)      │
└──────────────────┬──────────────────┘               └──────────────────┬──────────────────┘
                   │ GPU Native Texture                                  │ H.264 stream bytes
                   ▼                                                     ▼
┌─────────────────────────────────────┐               ┌─────────────────────────────────────┐
│    Zero-Copy H.264 Hardware Enc     │               │   Hardware H.264 Decoder (Surface)  │
└──────────────────┬──────────────────┘               └──────────────────┬──────────────────┘
                   │ Annex-B + CSD (SPS/PPS)                             │ GPU Output Texture
                   ▼                                                     ▼
┌─────────────────────────────────────┐               ┌─────────────────────────────────────┐
│  Compact Binary Envelope Packaging  │ ──Yamux Mux──▶│     Zero-Copy Preview Rendering     │
└─────────────────────────────────────┘               └─────────────────────────────────────┘

2. Core Technical Challenges and Optimization Dimensions

To compress end-to-end latency to the physical limit, we resolved several key technical challenges across our platform implementations:

  1. Physical Level Zero-Copy (Zero-Copy Pipeline): From screen capture (such as iOS ReplayKit, Android MediaProjection, Windows DXGI, etc.) to encoder input, pixel data remains within GPU memory, avoiding expensive CPU-GPU copies and pixel format conversions.
  2. Color Space Consistency (Color Matrix Calibration): Eliminated the green or pink color distortions often observed during cross-platform streaming (especially from iOS to Android/Web) by resolving mismatches between Rec. 709 and BT.601 YUV conversion matrix coefficients.
  3. Adaptive Parameter Tuning (Telemetry-Driven Adaptation): Established a 100ms feedback loop based on network telemetry. This dynamically scales bitrates and FPS without restarting the encoder, preventing screen flickering and keeping the stream alive on weak connections.
  4. On-Demand IDR Sync (On-Demand Keyframe Control): Abandoned high-bandwidth periodic keyframes. The receiver detects packet loss and triggers a fast encoder_sync command, forcing the sender to instantly generate an IDR keyframe to clear image artifacts.
  5. Browser High-Performance Playback (MSE with WebCodecs fallback): Inside the Wails controller frontend, fMP4 streams are preferably played via hardware-accelerated Media Source Extensions (MSE). If the player doesn't support the configuration or is using custom raw H.264 envelopes, it falls back to hardware decoding via WebCodecs and OffscreenCanvas GPU compositing in a background worker.

3. Deep Dive Series Guides

To help developers and technology enthusiasts understand the underlying details of this architecture, we have organized the implementation details into a 5-part technical series:

Real-time video streaming technologies continue to evolve rapidly. The Easy Connect system leverages these low-latency optimizations to deliver a fast, responsive, and seamless cross-platform remote control experience. Select any of the links above to read the specific technical implementation details.

Released under the MIT License. Terms | Privacy