Our mission to make business better is fueled by readers like you. To enjoy unlimited access to our journalism, subscribe today.
The recent departure of a respected Google artificial intelligence researcher has raised questions about whether the company was trying to conceal ethical concerns about a key piece of A.I. technology.
The departure of the researcher, Timnit Gebru, came after Google had asked her to withdraw a research paper she had co-authored about the ethics of large language models. These models, created by sifting through huge libraries of text, help create search engines and digital assistants that can better understand and respond to users.
Google has declined to comment about Gebru’s departure, but it has referred reporters to an email to staff written by Jeff Dean, the senior vice president in charge of Google’s A.I. Research division, that was leaked to the tech newsletter Platformer. In the email Dean said that the study in question, which Gebru had co-authored with four other Google scientists and a University of Washington researcher, didn’t meet the company’s standards.
That position, however, has been disputed by both Gebru and members of the A.I. ethics team she formerly co-led.
More than 5,300 people, including more 2,200 Google employees, have now signed an open letter protesting Google’s treatment of Gebru and demanding that the company explain itself.
But why might Google have been particularly upset with Gebru and her co-authors questioning the ethics of large language models? Well, as it turns out, Google has quite a lot invested in the success of this particular technology.
Beneath the hood of all large language models is a special kind of neural network, A.I. software loosely based on the human brain, that was pioneered by Google researchers in 2017. Called a Transformer, it has since been adopted industry-wide for a variety of different uses in both language and vision tasks.
The statistical models that these large language algorithms build are enormous, taking in hundreds of millions, or even hundreds of billions of variables. In this way, they get very good at being able to accurately predict a missing word in a sentence — but it turns out that along the way, they pick up other skills too, like being able to answer questions about a text, summarize key facts about a document, or figure out which pronoun refers to which person in a passage. These things sound simple, but previous language software had to be trained specifically for each one of these skills, and even then it often wasn’t that good.
The biggest of these large language models can do some nifty other things too: GPT-3, a large language model created by San Francisco A.I. company OpenAI, encompasses some 175 billion variables and can write long passages of coherent text from a simple human prompt. So imagine writing just a headline and a first sentence for a blog post and then GPT-3 can compose the rest of the blog post. OpenAI has licensed GPT-3 to a number of technology startups, plus Microsoft, to power their own services, which include one company using the software to enable users to generate full emails from just a few bullet points.
Google has its own large language model, called BERT that it has used to help power search results in several languages including English. Other companies are also using BERT to build their own language processing software.
BERT is optimized to run on Google’s own specialized A.I. computer processors, available exclusively to customers of its cloud computing service. So Google has a clear commercial incentives to encourage companies to use BERT. And in general, all of the cloud computing providers are happy with the current trend towards large language models because, if a company wants to train and run one of their own, they must rent a lot of cloud computing time.
For instance, one study last year estimated that training BERT on Google’s cloud costs about $7,000. Sam Altman, the CEO of OpenAI, meanwhile, has implied that it cost many millions to train GPT-3.
And while the market for these large so-called Transformer language models is relatively small at the moment, it is poised to explode, according to Kjell Carlsson, an analyst at technology research firm Forrester. “Of all the recent A.I. developments, these large Transformer networks are the ones that are most important to the future of A.I. at the moment,” he says.
One reason is that the large language models make it far easier to build language processing tools, almost right out of the box. “With just a little bit of fine tuning, you can have customized chatbots for everything and anything,” Carlsson says. More than that, the pretrained large language models can help write software, summarize text or create frequently-asked questions with their answers, he says.
A widely-cited 2017 report from market research firm Tractica forecast that NLP software of all kinds would be a $22.3 billion annual market by 2025—and that analysis was made before large language models such as BERT and GPT-3 arrived on the scene. So this is the market opportunity that Gebru’s research criticized.
What exactly did Gebru and her colleagues say was wrong with large language models? Well, lots. For one thing, because they are trained on huge corpuses of existing text, the systems tend to bake in a lot of existing human biases, particularly about gender and race. What’s more, the paper’s co-authors said, the models are so large and take in so much data, they are extremely difficult to audit and test, so some of these biases may go undetected.
The paper also pointed to the adverse environmental impact, in terms of carbon footprint, that training and running such large language models on electricity-hungry servers can have. It noted that BERT, Google’s own language model, produced, by one estimate, about 1,438 pounds of carbon dioxide, or about the amount as a roundtrip flight from New York to San Francisco.
The research also looked at the fact that money and effort spent on building ever larger language models took away from efforts to build systems that might actually “understand” language and learn more efficiently, in the way humans do.
Many of the criticisms of large language models made in the paper have been made previously. The Allen Institute for AI had published a paper looking at racist and biased language produced by GPT-2, the forerunner system to GPT-3.
In fact, the paper from OpenAI itself on GPT-3 that won an award for “best paper” at this year’s Neural Information Processing Systems Conference (NeurIPS), one of the A.I. research field’s most prestigious conferences, contained a meaty section outlining some of the same potential problems with bias and environmental harm that Gebru and her co-authors highlighted.
OpenAI, arguably, has as much—if not more—financial incentive to sugarcoat any faults in GPT-3. After all, GPT-3 is literally OpenAI’s only commercial product at the moment. Google was making hundreds of billions of dollars just fine before BERT came along.
But then again, OpenAI still functions more like a tech startup than the mega-corporation that Google’s become. It may simply be that large corporations are, by their very nature, allergic to paying big salaries to people to publicly criticize their own technology and potentially jeopardize billion-dollar market opportunities.
More must-read tech coverage from Fortune:
- 2020 was a banner year for European tech investment. Not even a pandemic could slow it down
- Holiday shipping deadlines for FedEx, UPS, and the Postal Service
- Quantum computing is entering a new dimension
- Battery startup backed by Bill Gates claims major breakthrough
- Indiegogo founder launches Vincent, a site to discover alternative investments