BestsellerMagazine.com - CATEGORY Worldwide news: TITLE
Microsoft's desire to crack this unique syntax is driven by its desire to build chat and speech bots that feel more human-like.
Researchers at Microsoft Research Labs in Bengaluru are studying the unique ways in which Indians mix languages in everyday conversations - and an easy source of such conversations are Hindi movies.A seven-member team of linguists, ethnographers, psychologists and computer scientists are busy poring through film scripts of movies like D-day, Dedh Ishqiya, Dum Laga ke Haisha, Queen, Pink, Raman Raghav 2.0, Neerja, Masaan, Kai Po Che, Udaan, Kapoor and Sons to understand how Indians mix languages in various social contexts.
“We are using 18 Hindi movie scripts to carve out about 15,000 dialogues to figure the need to mix language and the grammar behind language mixing,“ said Monojit Choudhury, a researcher at Microsoft Research (MSR) India.
To understand why a bunch of scientists are spending a large part of their working day going through Bollywood scripts, it's crucial to grasp the nuances of how Indians actually switch languages.
Choudhury cites the dialogues of Hindi film Queen starring Kangana Ranaut, who plays a character called Rani. In the 2013 blockbuster, Rani, a middle-class girl from Delhi, plunges into a journey of self-discovery after her engagement falls apart and she goes on a solo honeymoon to Europe.
“Rani, whose preferred medium of communication is Hindi, starts mixing languages while conversing with friends in Europe. There is a significant increase in the usage of English as compared to Rani in Rajouri Garden, India," said Choudhury.
“This is Hindi-English code switching known colloquially as Hinglish.“ For instance, while cribbing about how lonely she feels in a foreign country, Rani says: 'Mein akele road cross kar rahi hoon, akele Eiffel tower dekh rahi hoon, akele gunde ko fight kar rahi hoon'. The dialogue is an assortment of Hindi and English in a conversation.
Dialogues like these are then fed into a system where they are classified into separate buckets: Hindi, English and Hinglish dialogues. That gives the system an idea of how language mixing works. It also creates profiles of how various characters mix languages according to their social context.
“If a movie is set in a contemporary urban scenario, there is much more language mixing than a period film or a movie set in a small town," said Kalika Bali, a researcher at Microsoft. “Queen, D-day, Pink are more in the urban setting and have more code mixing, which the characters do, as compared to Dum Laga ke Haisha or a Dedh Ishqiya - which are set in older times or a more suburban setting, where the code mixing is less. They use more Hindi,“ she adds.
Insights like this would help companies like Microsoft build better, more socially-aware bots and enhance the quality of interactions.“The culmination of this research would be to create a socio-culturally relevant chatbot or a smarter Cortana (Microsoft's digital assistant) in the future," Choudhary said.
Choudhary says when a son or a daughter asks a chatbot about a gift to present to his mother on her anniversary, the chances are the bot might respond with an answer: `You should get her a tiara'. “For an Indian mother, tiara doesn't hold any cultural significance as compared to say a saree," says Choudhary.
For now, the researchers say they are hamstrung by the paucity of freely available scripts and have leaned on independent scriptwriters and filmmakers to source scripts. “If studios like Yash Raj could put out movie scripts for building AI models and testing, there will be nothing like it,“ Bali added.
Hinglish scripts are just the beginning, say researchers. “We are talking about Indian languages. Tamil, Kannada, Malayalam have such vibrant movie industries, which would be a great help for the research community to build better local language AI," said Bali.
For now, Microsoft has been working primarily on code-mixing in English-Hindi, English-Telugu and English -Bengali.
BestsellerMagazine.com, News Around the world presents the latest information of national, regional, and international, politics, economics, sports, automotive, and lifestyle.
Source : http://tech.economictimes.indiatimes.com/news/technology/microsoft-is-using-bollywood-movies-to-build-better-local-language-ai-models/61014436