Teaching AI to see
with purpose
What if AI could truly see and understand the world? Our hierarchical, compute-aware data formats are AI native by design, giving pixels purpose and delivering the speed and structure needed for multimodal Vision AI built and real time intelligence.
Teaching AI to see
with purpose
What if AI could truly see and understand the world? Our hierarchical, compute-aware data formats are AI native by design, giving pixels purpose and delivering the speed and structure needed for multimodal Vision AI built and real time intelligence.
When vision becomes intelligence
Humans have always experienced the world visually. Now machines do too, as visual data dominates global information. Cameras and sensors capture everything around us, yet less than one percent is ever processed by AI. Each frame is dense, unstructured, and built for human presentation, not machine analysis, so AI pipelines move pixels without purpose.
When a model needs visual data, it retrieves everything, decodes everything, and discards most of it. This “Retrieve all, Decode all, Discard most” loop burns compute, power and time. GPUs keep getting faster, but the data feeding them has not, creating a growing data to tensor gap where accelerators sit idle.
The problem is not the models. It is the data. Most attempts to fix this rely on more compute, larger models and bigger clusters, which only magnify the inefficiencies in today’s visual data. AI accelerates only when the data is structured for machines, not humans.
That shift starts with the data itself. It must give AI as much as necessary and as little as possible, delivered fast and in a structure aligned with how models and GPUs process information.
“
We taught machines to speak before we taught them to see. The next generation of AI will need efficient senses, data formats built also for analysis, not just display.
Guido Meardi
CEO, V-Nova
Vision AI
Why it matters
80%
of global data traffic is visual
<1 %
of visual data is analysed by AI
80%
Up to 80% of Visual AI time is data preparation
50%
of Visual AI latency comes from moving data
>10%
of global energy growth driven by AI
70%
Up to 70% of GPU time is spent waiting for data, not processing it.
Here is how we teach machines to see like humans
Our hierarchical and compute-aware formats give visual data the structure AI needs. They organise raw pixels into layers of independently-queryable information that can be accessed with purpose and efficiency. The hierarchy starts with broad context and adds detail only when needed, in the regions where it’s needed, so AI understands scenes progressively and selectively, instead of processing every pixel. AI no longer needs to process millions of pixels to understand that a sky is solid blue, or spend cycles on decoding the intricate details of a tree when it is classifying human actions.
This turns video from flat frames into a structure AI can search and target. Models retrieve only the regions they need, reducing decoding, data movement and compute load. When full resolution is required, the entire hierarchy can be decoded with massive parallelism that runs efficiently on GPUs and tensor cores. The result is faster performance, lower energy use and better scalability.
By embedding hierarchy,searchability and parallel processing directly into the data, we provide a stronger and more efficient foundation for vision AI across real world applications.
The benefits of hierarchical data formats
- Scalable by design
Adapts naturally to any resolution, region or task, from a single stream to planetary scale systems. Add a new AI workload and the format scales with it.No new proxies, no duplicate processing
- Smarter compute
Eliminates redundant operations and keeps GPUs focused on the core AI task, turning data pipelines into active AI infrastructure
- AI native efficiency
Built for multimodal and vision AI. It connects visual, sensor and spatial data within one compute aware hierarchy designed for intelligent perception.
- Faster response:
Enables real time perception and decision making making by reducing decoding, data movement and bottlenecks between data and models.
- Sustainable performance:
Minimises data movement, power use and cloud egress, Achieves more useful output for the same compute budget in cloud and on premises.
Build High-Performance Vision AI Pipelines with NVIDIA CUDA-Accelerated VC-6
Discover how V-Nova’s SMPTE VC-6 accelerates vision AI with NVIDIA CUDA for faster inference, lower latency, and efficient data flow.
FAQ’s
What makes V-Nova’s data formats different from traditional video?
Traditional video formats such as H.264/HEVC/AV1/VVC, image formast such as JPEG or PNG, or mezzanine codecs such as JPEG XS/XL or JPEG2000 were built for human presentation, not machine analysis. They compress data for viewing, but when AI needs to analyse it, the system must still fetch and decode every frame, only to discard most of it. Our hierarchical, compute aware formats give data structure and purpose so AI can access only what it needs, instantly. As much as necessary, as little as possible.
In so doing, they are also by nature massively parallel, so AI-native, GPU-native and tensor-native, which makes them extremely fast and low complexity.
How do hierarchical compute-aware formats help AI scale efficiently?
First of all, a full enc-decode is faster than that of a traditional format, thanks to massive parallelism. Secondly, they remove redundant operations and minimise data transfer. Instead of reprocessing every full-resolution frame, AI can scan scenes at lower details, focus on key regions and retrieve only what matters. This saves compute power, speeds up response times, reduces cost and scales efficiently across cloud and edge environments.
What does “AI-native” or “compute-aware” mean?
AI-native means the format is structured in a way that aligns with how machine learning systems analyse visual information. Instead of giving models a flat wall of pixels, an AI native format provides a hierarchy of context first and detail later, similar to how humans perceive scenes. It is also designed for massive parallelism, so full resolution decode runs efficiently on GPUs and tensor cores, making it far faster than traditional formats.Compute-aware means the format is organised to minimise unnecessary processing. It lets applications decode only the resolution or region required for a task, which reduces GPU load, memory use and energy consumption. This allows AI stages such as detection, tracking or indexing to run faster and more efficiently because they do not need to process full frames when a lower level of detail is enough.
Can hierarchical formats work with existing systems?
Yes. VC-6 and LCEVC are open standards designed to integrate easily with existing video workflows, codecs, and AI tools. Organisations can use their current infrastructure while gaining the benefits of AI ready data.
Why does this matter for sustainability?
AI is driving a major increase in global energy use. By reducing redundant processing and unnecessary data movement, hierarchical formats generate more profitable output per given input of power and help build a more sustainable, scalable future for Vision AI.
What does “retrieve all, decode all, discard most” mean?
Most production and AI workflows rely on traditional codecs that can only deliver full frames. Even if an application needs a small region or a lower level of detail, the system still has to retrieve the entire file, decode the entire frame and then discard most of the data. This wastes compute, bandwidth and energy, and it creates bottlenecks as resolutions and AI workloads grow.
Hierarchical and compute-aware formats work differently. They let applications access only the resolution level or region they need, avoiding full frame decode when it is not required. This removes unnecessary processing and creates a more efficient path for editing, review and AI analysis.