Meta’s new AI model can translate 200 different languages – including many low-resource ones not supported by current translation systems – thanks to the work of what CEO Mark Zuckerberg calls ‘one of the world’s fastest supercomputers.’
The company dubs its effort No Language Left Behind (NLLB) and it hopes to enable more than 25 billion translations across Meta’s apps each day.
Although there are more than 7,100 known languages spoken worldwide today, many of them do not have enough data sets available in order to train AI.
‘The AI modeling techniques we used are helping make high quality translations for languages spoken by billions of people around the world,’ Meta CEO Mark Zuckerberg said in a statement
These so-called low resources languages include Egyptian Arabic, Balinese, Sardinian, Nigerian Fulfulde, Pangasinan and Umbundu – which are spoke by a sizeable population but not as much on the internet itself.
‘The AI modeling techniques we used are helping make high quality translations for languages spoken by billions of people around the world,’ Meta CEO Mark Zuckerberg said in a statement posted to Facebook.
The new model can translate 55 African languages with ‘high-quality results,’ the company states.
‘To give a sense of the scale, the 200-language model has over 50 billion parameters, and we trained it using our new Research SuperCluster, which is one of the world’s fastest AI supercomputers.
‘The advances here will enable more than 25 billion translations every day across our apps.’
‘This means that this can impact billions of people by allowing them to communicate in their own native language,’ says Marta R. Costa-jussa, a research scientist at Meta AI. Pictured above is Zuckerberg’s post announcing the effort
‘Communicating across languages is one superpower that AI provides, but as we keep advancing our AI work it’s improving everything we do — from showing the most interesting content on Facebook and Instagram, to recommending more relevant ads, to keeping our services safe for everyone.’
‘This means that this can impact billions of people by allowing them to communicate in their own native language,’ says Marta R. Costa-jussa, a research scientist at Meta AI, in a video announcing the effort.
‘This is going to change the way that people live their lives, the way they do business, the way that they are educated, No Language Left Behind really keeps that mission at the heart of what we do — is people,’ says Al Youngblood, a user researcher at Meta AI.
‘The advances here will enable more than 25 billion translations every day across our apps,’ Zuckerberg, pictured above at an event earlier this year, said
For their No Language Left Behind project, the tech giant first needed to conduct exploratory interviews with native speakers of the low resource languages in order to figure out the translation needs.
Then it developed a computational model that’s trained on data obtained with novel and effective data mining techniques tailored for low-resource languages.
‘Critically, we evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200,’ the team of researchers state in the abstract of the paper explaining the new AI model.
The researchers also point out the broader benefits to bringing more low resource languages into the fold as a way to reduce digital inequality.
‘Given that the primary goal of NLLB is to reduce language inequities in a global context, more and more low-resource languages will be incorporated into the project (or others alike) in the long run,’ the researchers state.
Anyone can get a feel for how the new model works at Meta’s demo site.
Meta’s AI is focused primarily on 200 languages that are low resource – meaning they’re difficult for translation models to cover because there isn’t a lot of existing data to train the AI
Meta made all of its evaluation benchmarks for project open source so that researchers can dive into the data and it can be further evaluated