Part 3 | High-Performance Screen Capture and Codec Architecture Across Windows, macOS, and Linux
Achieving low-latency remote support on desktop operating systems (Windows, macOS, and Linux) is complex. Each platform features distinct graphics drivers, display compositors, and hardware-accelerated codec APIs.
Traditional CPU-bound screen capture combined with software encoding (e.g., using GDI/X11 grabs and x264 compression) struggles on high-resolution 2K or 4K displays. It consumes significant CPU resources, leading to interface lag and low frame rates.
The Easy Connect Suite's desktop client is built on Go (integrated with Wails for the frontend), with low-latency graphics capture and encoding pipelines implemented via native bindings: Windows DXGI & Media Foundation, macOS ScreenCaptureKit, and Linux PipeWire & VA-API. In this article, we explain the design of these pipelines.
1. Windows: DXGI Duplication and Media Foundation GPU Zero-Copy
On Windows, the primary API for high-performance screen capture is the DXGI Desktop Duplication API, introduced in Windows 8.
DXGI Capture Pipeline
By acquiring the display output interface IDXGIOutput5 and instantiating DuplicateOutput, we obtain an IDXGIOutputDuplication controller. When the screen updates:
- DXGI places the captured frame directly into a GPU-backed
ID3D11Texture2Dtexture. - We process the texture inside the GPU memory, avoiding transfers to system RAM.
Zero-Copy Hardware Encoding via Media Foundation (MFT)
To compress the captured D3D11 texture into an H.264 stream without copying data, we integrate the Media Foundation H.264 Encoder (MFT):
[ DXGI Desktop Duplication ] ──► [ GPU Texture ID3D11Texture2D ]
│
▼ (Shared Graphics Device)
[ IMFDXGIDeviceManager Registry ]
│
▼ (MFCreateDXGISurfaceBuffer Wrapper)
[ Media Foundation H.264 Encoder (MFT) ]
│
▼ (GPU Hardware H.264 Annex-B Stream)
[ Socket Send Buffer ]- Device Management: Instantiate a D3D11 device and register it with the Media Foundation
IMFDXGIDeviceManager. - Surface Buffer Packaging: Call
MFCreateDXGISurfaceBufferto wrap the DXGIID3D11Texture2Dpointer into a Media FoundationIMFMediaBuffer. - Zero-Copy Submission: Pass this buffer directly to the MFT encoder. The GPU processes the texture and outputs H.264 Annex-B bytes with minimal CPU usage.
CGO-Free COM Interoperability in Go
Instead of using complex C++ code linked via CGO (which makes cross-compiling from Linux/macOS difficult), Easy Connect Suite implements a pure Go COM layer. We use syscall.SyscallN and windows.NewLazySystemDLL to load d3d11.dll and mfplat.dll dynamically, compiling the client with CGO_ENABLED=0 while maintaining raw performance.
CPU Fallback Path
If hardware acceleration fails due to graphics driver issues, the client copies the GPU texture to a D3D11 Staging texture, maps it to CPU memory using Map(), converts the pixels to BGRA, and first attempts to fall back to OpenH264 software encoding. If OpenH264 loading or encoding fails, it then falls back to a software JPEG compressor as the ultimate backup to maintain the connection.
2. macOS: ScreenCaptureKit GPU Capture
On macOS 13 and later, Apple provides ScreenCaptureKit for high-performance capture.
- Granular Frame Capture: ScreenCaptureKit allows selecting specific windows, applications, or displays for recording. It returns frames as CoreVideo
CVPixelBuffertextures in GPU memory. - VideoToolbox Integration: We use CGO bindings to interface with ScreenCaptureKit's Objective-C APIs and pass the output
CVPixelBufferdirectly toVideoToolboxfor hardware-accelerated H.264 encoding. This zero-copy path provides energy-efficient capture and compression.
3. Linux: PipeWire DMA-BUF Capture and VA-API Dynamic Loading
Linux desktop environments are transitioning from X11 to Wayland. To support both display servers, we integrate PipeWire and VA-API (Video Acceleration API).
PipeWire DMA-BUF Zero-Copy Capture
- PipeWire Subscription: The client subscribes to screen updates from the PipeWire service.
- DMA-BUF Sharing: The Wayland compositor (such as Mutter or KWin) passes a DMA-BUF file descriptor to our client, pointing to the frame's GPU memory allocation.
- Hardware Compression: We pass the file descriptor directly to the VA-API encoder, avoiding memory copies.
[ Linux Compositor (Wayland/X11) ]
│
▼ (PipeWire GPU Buffer FD Transfer)
[ DMA-BUF File Descriptor ]
│
▼ (Direct GPU Memory Mapping)
[ VA-API Hardware Encoder ] ──► H.264 StreamDynamic Loading with dlopen/dlsym
Linux setups vary widely. Statically linking libpipewire-0.3.so or libva.so would cause the app to crash on systems missing those libraries.
To achieve portable binaries, we use a dynamic loader:
- Runtime dlopen: The client attempts to load
libpipewire-0.3.soandlibva.soat runtime. - Symbol Mapping: If successful, it binds the function pointers and enables the PipeWire and VA-API hardware pipelines.
- Graceful Fallback: If loading fails, the client falls back to X11 screenshot polling. For compression, it first attempts to download and load Cisco's OpenH264 library at runtime; if the OpenH264 fallback fails, it then falls back to software JPEG encoding as the absolute bottom line, ensuring the client runs even in minimal server environments.
By utilizing dedicated hardware encoders and system-level capture APIs, the Easy Connect Suite provides high-speed screen sharing across desktop environments. In the next part, we will cover hardware-accelerated decoding and rendering in the browser using WebCodecs.
