[ad_1]
There are over 600 million Internet users in India, but only a fraction of this population is fluent in English. However, most online services and much of the content on the web are currently exclusively available in English.
This language barrier continues to contribute to a digital divide in the world’s second largest Internet market that has limited the interpretation of the World Wide Web for hundreds of millions of users to a few websites and services.
Therefore, it is not surprising that the US tech giants, which have emerging markets like India, continue their growth. they are increasingly trying to make the web and its services accessible to more people.
Case in point: A feature provided by Google to quickly translate the content of a web page from English to Indian languages has been used more than 17 billion times by users in India over the past year.
Google, which has led this effort so far, revealed some of its new efforts on Thursday. The company, which sees India as its largest market by users and this year pledged to invest more than $ 10 billion in the country in the next few years, said it plans to invest more in machine learning and artificial intelligence at the research center. of Google in India and make its AI models accessible to everyone in the ecosystem. The company also plans to partner with local startups that serve users in local languages and “dramatically” improve the experience of Google products and services for Indian language users.
In that last part, the company today announced a series of changes that it is implementing in some of its services so that they speak more local languages and presented a completely new approach that it is taking to translate languages.
Product changes
Users will now be able to view their query results in Tamil, Telugu, Bangla, and Marathi, in addition to the English and Hindi that are currently available. The addition comes four years after Google added the Hindi tab to the search page in India. The company said that the volume of Hindi search queries grew more than 10 times after the introduction of this tab. If someone prefers to view their query in Tamil, for example, they will now be able to set the Tamil tab next to English and quickly switch between the two.
Getting search results in a local language is helpful, but people often want to query in those languages as well. Google says it has found that writing in a language other than English is another challenge users are facing today. “As a result, many users search in English even if they actually prefer to see the results in a local language that they understand,” the company said.
To address this challenge, Search will begin displaying relevant content in supported Indian languages where applicable, even if the local language query is written in English. The feature, which the company plans to roll out over the next month, supports five Indian languages: Hindi, Bengali, Marathi, Tamil and Telugu.
Google also makes it easy for users to quickly change the preferred language in which they see results in an app without altering the device’s language settings. The feature, which is currently available in Discover and Google Assistant, will now be implemented in Maps. Maps supports 9 Indian languages.
Similarly, Google Lens’s Assignment feature, which allows users to take a picture of a math or science problem and then submit their answer and guides students through the steps to get there, is now supported by the Hindi language. India is the largest market for Google Lens, said Nidhi Gupta, Google India’s senior product manager, at the event.
Jayanth Kolla, chief analyst at consultancy Convergence Catalyst, said the new Google Lens feature could pose a threat to some Indian startups like Doubtnut, backed by Sequoia Capital, which operates in a similar space.
Kuril
Google executives also detailed a new linguistic artificial intelligence model, which they call Multilingual Representations for Indian Languages (MuRIL), that offers more efficiency and precision in handling transliteration, spelling variations, mixed languages and other nuances of languages. MuRIL provides support for transliterated text when writing Hindi using Roman script, something that was lacking in previous models of this type, said Partha Talukdar, a research scientist at Google Research India, at a virtual event on Thursday.
The company said it trained the new model with Wikipedia articles and text from a data set called the Common Crawl. It also trained him on transliterated text from, among other sources, Wikipedia (powered through Google’s existing neural machine translation models). The result is that MuRIL handles Indian languages better than older, more general language models, and can deal with letters and words that have been transliterated – that is, Google is using the closest corresponding letters from a different alphabet or script.
Talukdar noted that the previous model that Google relied on proved non-scalable as the company had to build a model for each language separately. “Building specific language models for each and every task is not resource efficient as we often don’t have training data for tasks like this,” he said. MuRIL significantly outperforms the previous model, by 10% in native text and 27% in transliterated text. MuRIL, which was developed by Google executives in India and is in use for about a year, is now open source.
One of the many tasks MuRIL is good at is determining the sentiment of the prayer. For example, “Achha hua count bandh nahi hua” would previously be interpreted as having a negative meaning, but MuRIL correctly identifies this as a positive statement, Talukdar said. Or take the ability to classify a person in front of a place: ‘Shirdi ke sai baba’ would previously be interpreted as a place, which is wrong, but MuRIL correctly interprets it as a person.