Part 1 | Deep Dive: iOS Real-Time Screen Capture & VideoToolbox Hardware Encoding
When building a high-throughput, ultra-low-latency remote assistance system, screen capture and video compression are the most critical links. To achieve sub-150ms end-to-end latency, the source pipeline must be optimized to prevent unnecessary copies and CPU cycles.
The iOS ecosystem presents developers with strict constraints due to sandboxing and resource management. For instance, the Broadcast Upload Extension (the system-wide screen capture background process) is capped at a strict 50MB memory threshold. Exceeding this limit causes the OS to immediately terminate the process.
In this deep dive, we explore how the Easy Connect Suite leverages native ReplayKit and VideoToolbox frameworks on iOS to establish a zero-copy, low-latency screen capture and encoding pipeline.
1. The Screen Capture Framework: In-App vs. Broadcast Extensions
On iOS, we support two separate capture modes depending on the remote assistance scenario:
┌────────────────────────────────────────┐
│ iOS Screen Capture │
└───────────────────┬────────────────────┘
│
┌─────────────────┴─────────────────┐
▼ ▼
[ In-App Capture ] [ System Broadcast ]
Uses RPScreenRecorder Uses Broadcast Extension
Captures host window Captures full OS desktop
No memory constraints Strict 50MB RAM limitIn-App Capture
For light support sessions limited to our host application, we record using the ReplayKit RPScreenRecorder singleton. The capture runs directly inside the host process, bypassing IPC (Inter-Process Communication) and the 50MB extension limit.
System-Wide Capture
When a user shares their entire screen (including other apps and the iOS springboard), we instantiate an iOS Broadcast Upload Extension (RPBroadcastSampleHandler).
This runs as a separate background process:
- Initiation: The user launches screen recording from the iOS Control Center and selects our extension.
- Frame Delivery: The OS captures the display and passes raw frames as
CMSampleBufferstructures to our handler's callback. - Low-Footprint Routing: Because of the 50MB RAM threshold, we cannot perform heavy pixel copies or software compression inside the extension. The frames must be fed directly to the hardware encoder or piped to the host process via local Unix sockets.
2. Establishing the Zero-Copy VideoToolbox Pipeline
When ReplayKit delivers a CMSampleBuffer, we must immediately compress it into H.264 packets. We do this by feeding the GPU textures directly to VideoToolbox (VTCompressionSession).
VTCompressionSession Configuration
We configure the hardware compression session using VTCompressionSessionCreate with these properties:
- Codec Format: Set to
kCMVideoCodecType_H264. - H.264 Profile: Set to
kVTProfileLevel_H264_Baseline_AutoLevel(which excludes B-frames to achieve zero encoder buffering latency) orkVTProfileLevel_H264_Main_AutoLevelfor high-quality connections. - Real-Time Flag: We enable
kVTCompressionPropertyKey_RealTimeto force the encoder to prioritize low-latency output over compression ratio.
Direct GPU-to-Encoder Direct Path
Standard software encoders require copying pixel data from GPU memory to CPU buffers, converting it to YUV format, and sending it back to the encoder.
VideoToolbox reads directly from the GPU framebuffer:
[ ReplayKit Capture ]
│
▼ (CMSampleBuffer Wrapper)
[ CoreVideo CVPixelBuffer (GPU Texture Memory) ]
│
▼ (Direct pointer reference, zero CPU memory copies)
[ VTCompressionSessionEncodeFrame ]
│
▼ (GPU Hardware Compression Circuits)
[ Raw H.264 Annex-B Stream ]We extract the underlying CVPixelBufferRef using CMSampleBufferGetImageBuffer(sampleBuffer). We pass this pointer directly to VTCompressionSessionEncodeFrame. The pixel data stays inside the GPU memory throughout the capture and compression cycle. This minimizes CPU cycles and memory footprint, keeping power draw and device temperature low.
3. Avoiding the Green/Pink Tint: YUV Color Matrix Calibrations
When streaming video from an iOS capture client to Android, Windows, or Web decoders, developers often run into a common issue: the iOS screen colors appear distorted on the receiver's end, showing a distinct green or pink tint.
This color shift is caused by mismatches in YUV-to-RGB conversion matrix coefficients.
By default, when capturing screens at resolutions $\ge$ 720p, VideoToolbox flags the output stream's Video Usability Information (VUI) with Rec. 709 (HD video standards) properties. However, many decoders (such as Android's MediaCodec and Chrome's WebCanvas contexts) default to BT.601 (SD video standards) coefficients when converting incoming YUV frames back to RGB.
To prevent this distortion, Easy Connect SSH overrides the YUV color properties during session initialization:
// Force VideoToolbox to flag the stream VUI headers with BT.601 limited-range parameters
let specDict: [CFString: Any] = [
kCVImageBufferColorPrimariesKey: kCVImageBufferColorPrimaries_SMPTE_C,
kCVImageBufferYCbCrMatrixKey: kCVImageBufferYCbCrMatrix_ITU_R_601_4,
kCVImageBufferTransferFunctionKey: kCVImageBufferTransferFunction_ITU_R_709_2
]
// Apply color profile to the compression session
VTSessionSetProperties(compressionSession, propertyDictionary: specDict as CFDictionary)By forcing BT.601 tagging, the encoder writes the correct color coefficients into the H.264 SPS/PPS header metadata, ensuring that the receiver decodes YUV channels to RGB accurately.
4. Handle Rotation & Scaling via VTPixelTransferSession
When a user rotates their iOS device from portrait to landscape during a support session, the aspect ratio of the captured frames changes. Attempting to encode these frames directly causes stretching or decoder failures on the receiver side.
To manage orientation updates, we route the buffer through a hardware-accelerated VTPixelTransferSession:
- Session Reusability: Instantiate a persistent
VTPixelTransferSessionRefhelper. - Set Hardware Rotation Angle: When device orientation updates, set the corresponding rotation property:swift
// Set rotation parameter (e.g., 90, 180, 270 degrees) VTSessionSetProperty(transferSession, key: kVTPixelTransferPropertyKey_RotationAngle, value: rotationAngleInDegrees as CFTypeRef) - Allocate Target Buffers: Use a
CVPixelBufferPoolto allocate target buffers matching the rotated resolution. - GPU-Accelerated Transform: Run
VTPixelTransferSessionTransferImage, which uses the GPU's dedicated scaler (VDA) to rotate and scale the texture in under 2ms. The output buffer is then fed directly into theVTCompressionSessionfor compression.
Through this zero-copy pipeline, the iOS client delivers high-performance, energy-efficient screen capture and encoding. In the next part of this series, we will examine the Android platform, focusing on MediaProjection and Surface-mode encoding.
