Traditionally, machine learning (ML) algorithms need all data in a single central location. But in many cases this is no longer optimal for a number of reasons. New distributed models are now coming to ML (and AI as well). And they can potentially change the whole business.
How? It comes down to a combination of new technologies, consumer business opportunities and the way that AI devices are coming into our daily life. Can they even challenge the power of data giants by enabling user-held data models?
There are a couple of specific terms associated with distributed or decentralized ML and AI solutions, although the definitions are not totally consistent. (On a very general level, “distributed ML” can mean many kinds of solutions where data is not collected in a single centralized place to train ML algorithms for a specific AI solution.)
Distributed, decentralized, federated
One such term is federated learning. This means that a global ML model is created, but the model is trained in several nodes. The data in each node is not shared elsewhere. A central node creates an initial model (parameter set) that is then shared to the other nodes. Each node optimizes the model with its own data, and shares the optimized model back to the central node, which then creates a global model based on the models from all the other nodes.
Another term is Distributed AI (DAI), a.ka. Decentralized AI, which means generally distributed solutions for AI needs. Distributed learning and distributed ML can be a part of those solutions. However, there can also be reasons to distribute AI models that have been trained centrally, such as privacy, security, robustness, and availability.
Federated learning became a big deal when Google published a model for it in 2017. Google intended to use it for mobile phones. The idea was to create ML models that can be trained with the data of each mobile phone. As one would hope, privacy played an important role in developing the model, in the sense that the data on the phone need not be shared elsewhere.
That said, it’s worth remembering that in Google’s federated learning model, individual users do not really own and control the data on their phones. Rather (like with all Google services) all the data is under Google’s control. However, distributed and federated models would offer better privacy, as well as enable solutions where each user could utilize their own data.
So why are we just now talking about distributed ML and AI models? Looking at it in terms of the general benefits, there are several reasons:
- It’s either impossible or too expensive to collect and store all data in one place where everything is processed.
- Private data in particular cannot be collected in a central place and combined with other data.
- It’s important to have local solutions optimized rapidly for local use.
- Edge computing and 5G deliver better services when data is stored and processed near the user.
- There are cases where parties want to cooperate in developing optimal algorithms, but are not willing to share data.
Mobile phones and self-driving cars are good examples of why it makes sense to optimize ML models locally. It is better from a privacy perspective. There’s less data to transfer. The models might need some localization to be of any use. And, local training and models means better availability and lower latency.
In practice at least, most solutions are still combinations of some centralized components and some local components.
Data giant disruption?
Surprisingly, while we know an ML model and AI solution can be distributed, they are still usually centrally owned by one actor. For example, Google’s model has been more about distributing learning and processing. But its intention is not really to give local data to users or give them control over how data is used, or empower them to have their own ML and AI models based on their data and some shared learning from other users. Similarly, Apple also utilizes federated learning to optimize its own services – for example, Siri.
Big data giants have no motivation to change the models that effectively determine who owns and utilizes data. On the other hand, distributed models have helped data giants create better solutions, more effective training of models, and better privacy statements for users. Nonetheless, it hasn’t changed the fundamentals of how they utilize consumer data.
That said, when technology enables something, often it inevitably means business changes sooner or later. Now that it’s feasible to train ML models locally, even using consumer devices and the data stored on them to improve models for everyone, this has basically opened a door for models that empower users to actually control their data and utilize it.
Distributed business models
I wrote earlier about how we can expect to have sensors in most of our clothes, shoes, and accessories in the near future. Just think about all that data. If a consumer can collect that data for their own use, and have solutions (also based on ML and AI) to use in their daily life, this is one area where distributed ML solutions can really change the fundamentals of the business.
Another possibility is that Google, Apple, and a couple of other companies would manage all that data and base all AI/ML solutions on it. But this is not something people, regulators or even clothing brands want to see.
There are also networking technology companies, including Ericsson and Nokia, that want to push new architectures for how data is transferred and processed in networks.
Distributed and federated ML models are now developing rapidly. There is a lot of research work in this area, and scientists develop new models all the time. At the same time, we’re seeing the first signs that these models can dramatically change how consumer data is utilized by enabling user-held data and apps so consumers can get value from their data for themselves.
Together with the ongoing sensorization of everything, distributed ML has opened the door to totally new data business models.