AI-Native Encoding for GPU-Compute Workflows

Solving the “GenAI-to-VOD” bottleneck and accelerating all GPU-native compute workflows.

Use Case

AI-Native Encoding for GPU-Compute Workflows

Solving the “GenAI-to-VOD” bottleneck and accelerating all GPU-native compute workflows.

The challenge

A new content factory has emerged. Generative AI (GenAI) models from hyperscalers such as Meta, TikTok, YouTube, and Snap are producing massive volumes of video directly on data-center GPUs. The same applies to other GPU-native pipelines—from computer vision (CV) inference to AI-assisted post-production—where visual data is both generated and consumed by AI.

Yet these GPU-born assets often need to feed downstream processes such as inference, training, or distribution, creating multiple costly data handoffs. The problem lies in the hardware gap: high-performance compute GPUs (e.g., NVIDIA H100, A100, B200) lack dedicated video encoders (like NVENC). As a result, raw frames remain “trapped” in GPU memory (VRAM).

The only conventional route is to transfer multi-gigabit raw data across the PCIe bus to the CPU for slow, power-intensive software encoding. This stalls the GPU, wastes energy, and inflates total cost of ownership. The result is a system-wide bottleneck known as the “tensor-to-bitstream” gap.

The solution

To make AI pipelines truly efficient, the data format must be as compute-native as the models themselves.

VC-6 (SMPTE ST 2117-1) is a high-performance, hierarchical format architected for modern parallel processing of visual data (e.g., videos and images). Delivered as a software library running directly on GPU compute cores (available for NVIDIA CUDA and AMD/Intel OpenCL), it completely bypasses CPU bottlenecks.

A VC-6 “in-place” encode compresses the raw video frames within the GPU’s high-speed memory in milliseconds, immediately freeing the expensive AI accelerator for its next AI task. The output is a compact, AI-friendly mezzanine file.

Downstream AI workloads (for inference or training), such as content moderation, tagging, or search, can then use VC-6’s native selective capabilities to decode and fetch only the data they need—a specific low-resolution layer (LoQ) or Region of Interest (RoI)—without full frame overhead. This breaks the “Retrieve-all, Decode-all, Discard-most” cycle and accelerates the entire AI data pipeline, from generation to inference.

Optional integrations

When subsequent transcoding for distribution is required, MPEG-5 LCEVC can serve as an efficient enhancement layer—both computationally and in compression efficiency.  By pairing LCEVC with base codecs such as AV1 or HEVC, the system delivers higher visual quality at lower bitrates while reducing encoding complexity and energy use. This approach, validated in independent studies (e.g., SPIE Proceedings, 2022), frees additional GPU and CPU resources and further optimises end-to-end data transfer

Results and benefits

Eliminate Pipeline Stalls:  
Ultra-fast, GPU-native encoding in <5ms per frame for 4K 60p on NVIDIA CUDA cores.

Mezzanine Quality at Low Bitrates: 
Achieve intra-frame mezzanine quality (VMAF 95+) at 100-150 Mbps for 4Kp60 video, slashing storage and network costs—a two-orders-of-magnitude saving versus raw video.

Massive TCO & Energy Savings:
Drastically reduce power and cost to a fraction by eliminating CPU-bound encode farms and slow GPU-to-CPU data transfers.

Accelerate Downstream AI:
Over 10x faster decoding on CUDA vs. CPU. Reduce I/O and processing for AI inference by up to two orders of magnitude by decoding and fetching only specific regions, low-resolution layers, and/or color plane.

Hardware Flexibility:
Natively accelerated on CUDA, with full support for OpenCL and CPU implementations, ensuring broad deployment flexibility.

Why it matters

This use case solves the critical “GenAI-to-VOD Pipeline” bottleneck and applies equally to all GPU-native AI workflows that handle visual data. By aligning the codec with the compute architecture, VC-6 turns raw video from a transfer burden into a structured, AI-native data asset ready for immediate reuse across training, inference, and distribution.

Where further delivery optimization is needed, LCEVC extends the efficiency chain, maintaining quality while reducing bitrate and complexity. Together, they create a seamless bridge between AI inference on compute GPUs and video distribution pipelines, enabling faster, leaner, and more sustainable content operations at scale.