Series Intro | Real-Time Screen Capture & Codec Optimization Architecture Overview
When building a modern remote assistance and screen-sharing system (such as Easy Connect SSH and Easy Link Assist in the Easy Connect system), ensuring ultra-low latency and maximum image quality under complex network conditions is the core technical benchmark.
To achieve sub-150ms end-to-end latency in public network environments, and to allow smooth interactions across multiple platforms including mobile phones, computers, and web browsers, the Easy Connect system rejects traditional CPU-bound screenshotting and slow software encoding methods. Instead, it implements a standardized, hardware-accelerated, zero-copy H.264/AVC real-time video streaming pipeline across all platforms.
1. Core Architecture Design
Throughout a remote assistance session, devices are classified into two core roles:
- Controlled Role (Sender): Responsible for capturing local screen pixels, performing zero-copy encoding on the GPU, and packaging the stream using a compact binary protocol.
- Controller Role (Receiver): Responsible for receiving and distributing stream data, using hardware decoder chips to decode in real-time, and rendering the frames with zero copy for high performance.
The two roles establish bi-directional, low-latency data communication over an underlying multiplexed (Yamux) secure tunnel. The overall pipeline architecture is designed as follows:
Controlled Role (Sender) Controller Role (Receiver)
┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐
│ GPU-Based Capture (SCK/DXGI/etc.) │ │ Network Transport (Yamux) │
└──────────────────┬──────────────────┘ └──────────────────┬──────────────────┘
│ GPU Native Texture │ H.264 stream bytes
▼ ▼
┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐
│ Zero-Copy H.264 Hardware Enc │ │ Hardware H.264 Decoder (Surface) │
└──────────────────┬──────────────────┘ └──────────────────┬──────────────────┘
│ Annex-B + CSD (SPS/PPS) │ GPU Output Texture
▼ ▼
┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐
│ Compact Binary Envelope Packaging │ ──Yamux Mux──▶│ Zero-Copy Preview Rendering │
└─────────────────────────────────────┘ └─────────────────────────────────────┘2. Core Technical Challenges and Optimization Dimensions
To compress end-to-end latency to the physical limit, we resolved several key technical challenges across our platform implementations:
- Physical Level Zero-Copy (Zero-Copy Pipeline): From screen capture (such as iOS ReplayKit, Android MediaProjection, Windows DXGI, etc.) to encoder input, pixel data remains within GPU memory, avoiding expensive CPU-GPU copies and pixel format conversions.
- Color Space Consistency (Color Matrix Calibration): Eliminated the green or pink color distortions often observed during cross-platform streaming (especially from iOS to Android/Web) by resolving mismatches between Rec. 709 and BT.601 YUV conversion matrix coefficients.
- Adaptive Parameter Tuning (Telemetry-Driven Adaptation): Established a 100ms feedback loop based on network telemetry. This dynamically scales bitrates and FPS without restarting the encoder, preventing screen flickering and keeping the stream alive on weak connections.
- On-Demand IDR Sync (On-Demand Keyframe Control): Abandoned high-bandwidth periodic keyframes. The receiver detects packet loss and triggers a fast
encoder_synccommand, forcing the sender to instantly generate an IDR keyframe to clear image artifacts. - Browser High-Performance Playback (MSE with WebCodecs fallback): Inside the Wails controller frontend, fMP4 streams are preferably played via hardware-accelerated Media Source Extensions (MSE). If the player doesn't support the configuration or is using custom raw H.264 envelopes, it falls back to hardware decoding via WebCodecs and OffscreenCanvas GPU compositing in a background worker.
3. Deep Dive Series Guides
To help developers and technology enthusiasts understand the underlying details of this architecture, we have organized the implementation details into a 5-part technical series:
- Part 1 | Deep Dive: iOS Real-Time Screen Capture & VideoToolbox Hardware Encoding
- Learn how the iOS Broadcast Extension captures display frames using ReplayKit and compresses them using VideoToolbox, all within Apple's strict 50MB memory threshold.
- Part 2 | Practices on Android Real-Time Screen Sharing and MediaCodec Hardware Codec Optimization
- Examine Android MediaProjection capture, OpenGL ES shaders for frame rotation and scaling, and zero-copy MediaCodec Surface-mode encoding/decoding with error recovery.
- Part 3 | High-Performance Screen Capture and Codec Architecture Across Windows, macOS, and Linux
- Under-the-hood analysis of Windows DXGI/Media Foundation zero-copy MFT encoding, macOS ScreenCaptureKit, Linux PipeWire buffers, and dynamic system loading (dlopen) for portable binaries.
- Part 4 | Frontend Hardware-Accelerated Decoding and Ultra-Low-Latency Rendering with WebCodecs & OffscreenCanvas
- Explore MSE hardware-accelerated playback as the primary path, and Web Workers WebCodecs GPU decoding & OffscreenCanvas rendering as the fallback path.
- Part 5 | Low-Latency Video Streaming and Telemetry-Driven Adaptive Bitrate Optimizations in Weak Networks
- A breakdown of adaptive transmission: in-place encoder property overrides, telemetry-driven bitrate scaling, on-demand IDR synchronization, and custom
compact_binary_v1envelopes.
- A breakdown of adaptive transmission: in-place encoder property overrides, telemetry-driven bitrate scaling, on-demand IDR synchronization, and custom
Real-time video streaming technologies continue to evolve rapidly. The Easy Connect system leverages these low-latency optimizations to deliver a fast, responsive, and seamless cross-platform remote control experience. Select any of the links above to read the specific technical implementation details.
