LLMs – large language models – are being developed by tech companies, big and small. They aim to do good, but there are very real reasons to suppose they will accelerate a digital divide and lead companies to a dark side of bias.
Last week Sundar Pinchai announced that Google was introducing it most effective productivity tool yet – LaMDA.
LaMDA is an LLM that Google will embed into systems, and it will be the LLM, not a human, that customers will interact with. Soon, according to the MIT Technology Review, most if not all communication will happen with an LLM.
All of which sounds great – efficient, effective, intuitive, you name it.
Yet, there are potential downsides to LLMs. And the reason that Google sacked two of the foremost AI ethics researchers was that they dared point out the dark side. Literally.
Whether we like it or not LLMs are biased.
For a start, LLMs will be trained by companies in Silicon Valley, and therefore their first language will be Silicon Valley. They will unwittingly (having no actual wits) be trained with the company’s culture embedded in the system. And whether that is biased towards a political viewpoint or race or colour or sex, it is impossible to eliminate this inherent flaw. Students and researchers have already proved that, with the right prompts, they can encourage everything from genocide to self-harm.
For the large tech companies such as Google and Facebook (using an LLM to moderate content), there is no real reason to invest the amounts of money necessary to fix the problem.
LLMs are trained on millions and millions of pages of text, scraped for the most part, from websites. This will include, of course, websites that promote radical points of view, which they will ingest and ‘spew back across the internet.’ Timnet Gebru, one of the ethics researchers, fired for not retracting a paper pointing out this dark side, recently illustrated the problem with an example of the war in her home country of Ethiopia.
The multitude of languages in use (none in use in Silicon Valley), the scattergun way news was reported all contributed to amplifying the problem, which manifested itself in biased and misleading reporting and a complete lack of coherent language recognition.
One attempt at a solution is a project called BigScience, a collaboration of researchers from around the world, who are attempting – voluntarily – to build a model for an LLM that will be trained by as many different biases as possible and include, initially, several different languages.
Whether the group achieves something worthwhile, we will know in the middle of 2022. In the meantime, companies looking to automate processes such as customer service should think deeply about whether an LLM is the right choice and whether they really want their customers to talk to a monstrous, biased machine.