Data is the basis for many operations, but it doesn’t mean data is always reliable. Things can get complicated when you don’t know which data source is reliable and which is not. But we must use data all the time. Sometimes it is possible to increase the accuracy, but the more meaningful solution is to build a software layer to correct data before using it.
I earlier wrote about known and unknown things and data points. The reality is even more complex. We know some data is relevant, and it is available, but we don’t always know how reliable it is. We all know about opinion polls and their error margins. It is just one example, but uncertainty is linked to all data sources and models that utilize data.
In aeroplanes or nuclear power stations, the core systems do not necessarily trust individual sensors or data sources. There can be many reasons why a particular sensor gives incorrect data. For example, a pitot tube that measures an aircraft’s airspeed can transmit incorrect information if frozen, which has caused several plane crashes. Today, a plane typically has several pitot tubes, and the software tries to draw conclusions and give pilots warnings if one or more give inconsistent readings.
Sometimes the situation is more demanding when it is difficult, even impossible, to know if data sources and sensors give accurate data and how large the error margin is. Examples of this are wearable devices. They can measure your exercise patterns, sleep, and body functions like heart rate, temperature or blood pressure. These devices are calibrated using higher accuracy devices during development. But it is still hard to say how accurate they are for different people in different situations. For example, even with top-level research instruments, it is not easy to measure how much light sleep, REM, and deep sleep a person has at night.
We might also have a situation where we have many sensors, but some data might be missing. It is a complex task to combine data from different sources, and it is also tricky to know if available data makes any sense combined. This can occur when having many IoT sensors or an organization’s internal data from multiple sources to measure processes or even financials.
It is often said that intelligence makes up only 20% of AI implementations, and the rest is getting data, combining it and correcting errors. This layer is often underestimated. I have seen projects where 95% of the data is inaccurate, incorrect, or missing data points.
There are several ways to increase the accuracy of data, for example:
- If we get the same data from several sources, we can have a ‘voting’ model to determine the ‘correct’ data from most sources. The pitot tube system often works like this.
- We can learn from different sources’ accuracy and take their error margins into account to correct data. Opinion poll models often have correction factors.
- More complex solutions combine several data sources and make conclusions about what the data can indicate and how well data fits this. For example, if a person is running, this can be concluded from a combination of several data points from wearable devices, e.g. motion sensors, speed, heart rate and oxygen. Another example is that the software in a phone camera system tries to make a photo better based on each camera’s pixel data by correcting individual pixels alone and how the pixels fit together.
- Sometimes it is possible to have a feedback loop to know how accurate some data is and then have machine learning type models to develop correction factors and models to use data.
These layers combine, correct and smartly use data and become more important as we get more data sources. One could even say it is pretty simple to create AI models if someone has developed this layer to make reliable data available. It is often said that IoT business is not really to sell sensor hardware but to manage data, but what is ignored many times is the critical question of getting reliable data.
It is not easy to make these layers that combine data because each source is different, and it can also require an understanding of the data to be able to analyze and integrate data sources. It is possible to make general models and tools for this, but they often need tailoring for the different data sources and combinations of data sources.
With AI’s hands, these smart data combining models and layers become a vital part of the data and AI business. Data is valuable only if it is reliable. We can trust AI only if it can use correct data. The reality is that no data source is 100% reliable, so we need intelligence, how to correctly and optimally use data sources.