Now is the time to make LLMs cheaper to run and less crazy

Now let's make LLMs cheaper to run and less crazy
Image by digitalista | Bigstockphoto

OpenAI has admitted that the limitations of massive compute and massive data for GPT and large language models (LLMs) may have already been reached. This means that to make these systems useful, ways need to be found to implement them cheaply and make them less crazy.

At an event last week at MIT, Sam Altman, CEO of Open AI, confessed that making the models bigger would no longer make them better, leading many critics to highlight that we are already at the point of diminishing returns.

This was already evident from OpenAI’s “research paper” on GPT-4 where a probable order of magnitude increase in the size of the model produced linear improvements in performance. This is typical of a technology where its maximum potential is close to being reached, which is something I have been expecting in deep learning for some time.

More ChatGPT lookalikes are coming

With everyone rushing headlong into this technology, this means that pretty soon there are going to be a large number of ChatGPT lookalikes, all of which will consume vast resources and all of which are equally crazy.

That means that the value in AI research will quickly move from making the models bigger to making the models cheaper and easier to implement, as well as reducing the hallucinations as much as possible.

This is what will be required to make the two commercial use cases (beyond entertainment) that I have identified become a reality: (1) the cataloguing of data within an enterprise, and (2) the man-machine interface in the vehicle.

Both of these make use of the ability of LLMs to accurately understand the request being made as well as the context and circumstance of the request, which means the model has a very good idea of what it is being asked to do. This has been a limitation of chatbots to date, and the main reason why they are only used for telling the time, turning on lights and playing music.

LLM pros and cons

LLMs are very good at ingesting large amounts of information which can be easily retrieved without having to put the data into a database or label it. The problem is that when they are asked something that they do not specifically know the answer to, they convincingly make stuff up. Consequently, the user needs to double-check everything that they produce.

LLMs are also very expensive to deploy. GPT-3, with 175 billion parameters, needs 800 GB of storage and substantial compute resources to execute requests in a timely manner. It is here where I think that the valuable innovations are going to be made – the more hallucinations can be contained (or at least identified) and the cost to run them reduced, the more practical they become for real-world use cases. Real-world use cases lead to revenue, which at the end of the day is why almost everyone is in business.

In the enterprise use case, shared models are already proving to be problematic as there is potential for data leakage. In the vehicle use case, a voice service based in the cloud is not reliable enough. Hence, both of these use cases are going to require an instance of the LLM to be either deployed on-site (i.e. at the edge or on device) or in a private cloud.

The path to monetizing LLMs

This means that ways need to be found to cost-effectively implement LLMs at the edge of the network and, in many instances, on the device itself. This – combined with addressing (or at least containing) the hallucination problem – is how LLMs move from the realm of wild speculation into revenue-generating reality.

There is still a lot of work to be done. While many people can imagine the use cases, hardly anyone knows how to deploy it in practice.

Nvidia remains the go-to place for those wanting to invest in this craze, but I am starting to look for those that can reduce the limitations of these systems, as well as efficiently deploy them in edge devices. While Nvidia seems to have the training space locked up, there is a great opportunity in inference which, when at scale, could be much larger than training.

Be the first to comment

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.