In many countries, streaming data now makes up a considerable portion of all raw information in use, with video forecast to account for 82% of all internet traffic by 2022, according to the Visual Networking Index. (1)
In 2019, Internet users watched 1.1 billion hours of live video with the video streaming market expected to hit 184.3 billion hours by 2027, says Grand View Research.
Livestreamed data has become increasingly popular, mirroring the rise to prominence of Netflix, Disney+, HBO Max, YouTube celebrities, TikTok stars, Zoom calls, as well as livestreamed e-commerce and a slew of emerging interactive services. The latter include video games, in-game live events and e-sports tournaments, which have gone mainstream.
Video streaming itself can be split into two broad categories – live video streaming and so-called non-linear video streaming.
Live Video Streaming involves the transmission of real-time content, a good example being Twitch, a live streaming service for gamers. Non-linear video streaming is on-demand in nature whereby viewers can record or download videos on their televisions, computers, or smartphones and watch them later. It offers various advantages such as being able to view anytime, has a large storage capacity, and can be recorded for convenience and series linking. Netflix is the standout example.
From a technical standpoint, if you take 1.1 billion hours of live video, then at 1080P this equates to 1.65 exabytes, and at 4K this equates to 7.92 exabytes. (2)
Livestreaming services create a heavy load on network infrastructure that impacts live video quality. This is particularly so when more people rely on streaming to work from home because of the COVID 19 pandemic. Service providers and enterprises can apply the power of edge computing for data caching to avoid infrastructure problems. Edge computing caches popular content in facilities located closer to the end-user.
Market researcher IDC believes that streamed data is likely to be cached in storage media until the servers complete analytics. The amount of data stored at the edge is increasing at a faster rate than data stored in the core. (3)
The edge is expected to store critical data and insight that fuels latency-sensitive requests from endpoint transactions and services. At the same time, the edge is making possible distributed computing to perform analysis of streaming data. (4)
For any enterprise that is looking to involve livestreaming more in its business operations, then it will have to look at revamping its IT systems. This is especially so for more front-facing operations involved in e-commerce or customer support. But also applies to day-to-day activities, particularly in these COVID-19 affected times, where Zoom calls have become more routine.
Streamed data processing requires two layers: a storage layer and a processing layer. The storage layer needs to support record ordering and strong consistency to enable fast, inexpensive, and replayable reads and writes large data streams. The processing layer is responsible for consuming data from the storage layer, running computations on that data, and then notifying the storage layer to delete data no longer needed. (5)
When it comes to implementing livestreamed business services, there is a requirement for both edge computing and cloud computing. This data needs to be processed sequentially and incrementally on a record-by-record basis or over sliding time windows and used for a wide variety of analytics, including correlations, aggregations, filtering, and sampling.
According to Seagate’s Rethink Data Report, on average, organizations periodically transfer about 36% of data from the edge to core. Within only two years, this percentage will grow to 57%. The volume of data immediately transferred from edge to core will grow from 8% to 16%. To accommodate this increase, data management plans must enable a much more significant data movement—from endpoints, through edge, to public, private, or industry clouds. (6)
Digging a little deeper on the technical side of the storage layer, if you look at some of the world’s most prominent content-driven companies, like Netflix or Facebook, and how they manage long-term storage and instant access, object storage is the common feature. Object storage provides more than just storage and can be thought of a cross between a web server, content delivery network, and an asset management solution.
Object storage has rapidly become the standard for capacity storage, quickly augmenting and displacing file storage due to improved economic efficiencies and scalability. Applications benefit from the greater intelligence incorporated in data sets, and object stores provide this intelligence. Storage types include block, file, and object. Block is critical for so many mission-critical applications that are performance sensitive. File has serviced legacy applications and provided robust architecture for years. Object storage is focused on new application development in combination with block storage. A number of legacy file applications are migrating to object storage to take advantage of the economies of scale that object storage enables. (7)
Netflix streaming uses approximately 250MB-1GB/hour, depending on which quality setting is used. Streaming using the lowest quality setting on Netflix will use around 5MB/minute or 300MB/hour, the medium quality will chew through 9MB/minute or 450MB/hour, while the high-quality setting will churn through 17MB/minute which equates to a full GB for each hour of Netflix consumed. Most programs run for an hour, that’s anywhere from 250MB – 1GB of data utilized for each episode of Star Trek Discovery or Flight Attendant.
FaceTime and Zoom uses approximately 90MB per hour. A FaceTime video call uses around 3MB/minute, so about 180MB for that hour-long chat with grandma. As video conferencing becomes a more popular way to stay connected with work colleagues and family, data requirements will increase.
Meanwhile, you have the data-producing devices at the edge, coupled with storage and compute/analytics. The compute/analytics can range from something like a Splunk DSP to a deep neural network model, but the main focus here is that there is extract, transform, load (ETL) processing and insight generation at the remote edge. These instances are containerized and managed with Kubernetes as data pipelines.
For its part, edge storage systems need to be made up of completely disposable physical infrastructure. If they suffer an outage, whether due to power interruption or other glitches, there should be no data loss. The critical data would be stored in the public cloud so that individual hardware elements at the edge can be treated as expendable since they are more exposed to the rigors of life out in the field. Fortunately, with the expansion in production and the lowering of costs, the deployment of such endpoints has become easier and cheaper. So even if there is an outage replacement is quick and easy.
A good example would be the deployment of edge computing on an oil rig in the North Sea, where conditions can be extremely harsh with high winds, big waves and the occasional storm. By operating on the rig, data can be gathered from multiple sensors on drilling equipment; collected and processed in situ to maintain optimum operations. Only occasionally need it be uploaded and transferred to the cloud. If the edge computing unit becomes damaged, it can be quickly replaced.
Similarly, in a hot and dry mining environment in Western Australia, edge computing can be used to monitor rail transport to and from the mine and make sure all the iron ore carrying wagons are performing within specification, and there is no danger of derailment. Livestreamed video footage of each wagon’s couplings can be monitored and processed with machine learning on the train to ensure everything is operating within specification and even be used for preventative maintenance. Such data can also be submitted to the authorities to satisfy Australia’s comprehensive workplace health and safety regulations. (8)
According to IDC, more and more livestreamed data will need to be analysed and actioned at the edge. In tandem, there is also a growing need for object storage as part of this. The shift of data’s center of gravity to the edge is being driven by emerging technologies such as AI and IoT, with the move to 5G also providing extra impetus. Businesses need to plan for scalability, data durability, and fault tolerance in both the storage and processing layers when using livestreamed data across all industries. (9)
- Cisco Visual Networking Index (VNI) Slide 2
- HD-quality video uses about 0.9GB per hour (720p), 1.5GB (1080p) and 3GB (2K) per hour. UHD (4K) quality video uses 6GB per hour. And 8K 12GB per hour. How much mobile data does streaming media use? | Android Central
- Rethink Data Report – P12
- Rethink Data Report – P12
- Rethink Data Report – P12
- Seagate 2021 Data Storage Trends 2021 Prediction: Five top storage trends to watch – FutureCIO
- Anecdote from engineer who works in mining in Australia
- Rethink Data Report – P11
By BS Teh, Senior Vice President, Global Sales and Sales Operations, Seagate Technology