How AI can transform video offerings into Big Video

Image credit: Fred Mantel /

To accurately define the emerging status and trends of the convergent video landscape, ZTE and other players have proposed the concept of ‘Big Video’ that covers four aspects: Big Content, Big Network, Big Data, and Big Ecosystem.

The Big Video industry has benefited greatly from the recent advancements in artificial intelligence (AI), especially machine learning (ML) and deep learning (DL), which is a specialized and powerful version of ML with many processing layers.

Advancements in AI

AI has two major schools: rule-based expert systems and data-driven machine learning systems.

Rule-based expert systems perform efficiently and deterministically but lacks the ability to learn adaptively from the data sets being processed.

Data-driven machine learning systems, especially deep learning systems, are able to solve problems that are hard to define or enumerate by explicit rules, for example, natural language understanding. They can also autonomously learn from the massive big data sets and improve their accuracy over time.

ML and DL require massive computing power that was not feasible until the commercial deployment of cloud computing systems. In recent years, ML/DL have been able to recognize and understand objects, faces, voices, and conversations with 95%+ accuracy, equivalent to human classification errors. This is an enabler to unlock many advanced features in Big Video.

AI for Big Content

Big Content denotes the current diversified trends in the digital content business, ranging from audio to video to virtual reality (VR)/augmented reality (AR)/mixed reality (MR), from standard definition (SD) to high definition (HD) and ultra high definition (UHD), from traditional studio-produced long-form movies and dramas to user-generated mobile-friendly portrait-mode short videos.

The content may be professionally generated content (PGC), occupationally-generated content (OGC) such as livecasting of online games, and user-generated content (UGC).

To facilitate fluent multi-screen experiences and sharing by social network services, short clips and trailers of long content are essential. This has traditionally been done by experts manually.

One smart use case of AI for Big Video is to automatically generate interesting video clips as trailers. AI is able to analyze each video frame to determine all the images and mark them with timestamps. For example, for a 90-minute soccer game, AI is able to recognize scenes related to goals, misses and audience cheers.

For the audio/speech side of Big Video, a branch of AI called natural language processing (NLP) has enabled automatic generation of subtitles and closed-captions based on a good understanding of the dialogue in a video.

Moreover, object-recognition (including face-recognition) technologies have made it possible to automatically tag objects with additional information. This is especially useful for VR, AR, and MR.

AI for Big Network

Big Network addresses the fact that modern video content may be delivered via many types of networks, including terrestrial broadcasting, analog/digital cable, analog/digital satellite, managed IPTV and unmanaged OTT videos over fixed broadband networks, Wi-Fi networks, and mobile data networks.

IPTV and OTT video systems are notoriously hard to maintain and manage due to their non-deterministic and non-repeatable nature. Unsupervised learning algorithms can be used to automatically detect abnormality of network operations.

AI for Big Data

Big Data in the video industry covers multiple dimensions: subscribers, content/assets and network resources.

Face recognition of each subscriber via the set-top box (STB) or soft clients (e.g. smart phones/tablets) provides a smooth user experience with an individualized electronic program guide (EPG).

A content recommendation system is critical to attract users and increase ARPU. It is based on supervised learning algorithms that remember what a user likes and dislikes and figures out a formula that covers many features/properties of the user and content.

AI for Big Ecosystem

The Big Ecosystem of the video industry involves many parties: content producers, content aggregators, solution vendors, multi-channel video service providers/operators, and advertisers.

An interesting example here is smart AI advertisements based on image recognition and video blending. For example, appropriate advertisements or logos may be added onto an open space of the video scene and appear as naturally embedded objects.

APIs to access subscriber data and network statistics can be offered to third-party developers that create advanced features and services. For example, enterprise-oriented online education systems can utilize the content delivery network built for the operator-oriented pay TV services.

Final thoughts

AI is not a panacea that can solve all the problems in one shot. Nevertheless, with massive amounts of data to train the system, AI can become smarter and smarter.

For the Big Video industry, smart AI leads to happier users, more attractive contents, better-managed networks, and more prosperous ecosystems.

Written by Weijun Lee, CTO Group, ZTE

This article is sponsored content from ZTE

Be the first to comment

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.