Data silos, data pipelining and other daily issues enterprises have with their data are talked about so often these days that they’re practically clichés. In fact, this is an area where we have seen several unicorns (e.g. Databricks, Snowflake) for enterprise data during the last ten years. But these problems are not limited to enterprise IT and data – we’re now seeing the same issues with IoT, sensor systems and wearables. The number of data sources is growing rapidly, but the bottleneck is combining the data and converting it to a format where it can be fully utilized. There will be many winners and losers seeking a solution to this – and the losers will likely be the ones searching for a one-size-fits-all solution that can work across any industry.
A few days ago, I was involved in a discussion about smart cities. It’s clear that more data is available all the time, such as city-level data (e.g., traffic, air quality or utilities) and building-level data (e.g., energy consumption, home automation or security). Yet, it remains a challenge to combine data from those separate silos and use it to build applications that utilize data from all kinds of sources.
Meanwhile, the problems with wearable data are similar to those concerning buildings, cities, or any IoT systems – nowadays, there is too much emphasis on applications that use data only from one type of data source. The real value comes from combining data from as many sources as possible.
What can be done to solve the issue?
One typical comment is that we need to create specific standards. Perhaps – but the reality is that standardization work is typically far too slow to solve problems like this. It is also very hard (close to impossible, in fact) to get enough commercial players to commit to a standard in this fast-evolving area. Smart-city companies have seen a lot of local and global standardization initiatives over the years, but so far, none of them have really made any difference.
In enterprise systems, we are seeing companies – usually a startup building data pipelining solutions for specific use cases, such as how to move data from CRMs to SAP. Another popular enterprise data solution is to implement a data lake, where you store raw data and then build layers to modify the data to a format so that it can be used in different systems.
The trillion-dollar question
We can list many areas where more IoT data is becoming available – for example, wearables, buildings, cities, public safety and security, military, and healthcare. But all of them are struggling to utilize their data properly. The trillion-dollar question is whether we must develop vertical-specific solutions to tailor data collection, pre-processing and formatting in each area separately, or is there a common solution that everyone can use?
It is hard to believe that there is one magic format that would solve all these needs. It is much more feasible that we’ll see vertical-specific intermediate solutions, which most probably will have to be based on open architecture. In practice, this means that there will be companies building industry-specific data pipelining and intelligent layers to pre-process, structuralize and enrich data.
These layers cannot be totally industry agnostic. Their design must take into account some critical requirements for the applications, as well as requirements for the data itself and its accuracy. For example, analyzing ECGs from wearable heart rate data has different data accuracy and synchronization requirements than measuring air pollution from air quality sensor data. And services that evaluate daily traffic flows have different requirements than applications that detect security threats and react to them rapidly.
Vertical or industry-specific data layers also need sufficient versatility, which means APIs for sensors and other IoT devices. Then you can have, for example, open source components to connect a data layer to a specific device. In practice, it can be components to read data from wearables and clinical devices for a health service data layer, or components to send data from motion sensors and cameras to security and safety data layers.
Finding the right balance
The successful data layers most probably will be those that strike precisely the right balance of versatility and industry-specific tailoring. Only in that way is it then possible to have APIs on the data layer to develop valuable applications. Find that right balance is difficult but crucial – it will define who will be successful and who will fail with those projects.
The IoT, sensor, data and data-based application markets are still in a very early phase. In the near future, we will see a lot of development in those areas. These will also be some of most exciting startup areas. But they will also need new architectures to better utilize data from different sources and make applications for very different needs. The winners will figure out how to do this – the losers will likely be the ones who try to build one-size-fits-all data models.