IIIT Hyderabad Kicks Off Largest Crowdsourcing Speech Project To Connect Voice with Vernacular

0
126

More than 50% of Indians use devices embedded with AI-based speech recognition technology, but these are dominated by English-speaking assistants such as Siri, Alexa, and the Google Assistant. Owing to its expertise in Language and Speech Processing, the International Institute of Information Technology Hyderabad (IIITH) has now joined forces with the Indian government to assist with the automatic speech recognition (ASR) module for the translation of Indian languages.

This project supports the Technology Development for Indian Languages (TDIL) initiative by The Ministry of Electronics and Information Technology which aims to enable the widespread proliferation of ICT in all Indian languages. This involves automatic speech recognition (ASR), speech-to-speech translation, and speech-to-text translation.

The project is being headed by Prakash Yalla, Head, Technology Transfer Office and Dr. Anil Kumar Vuppala, Associate Prof, Speech Processing Centre.

To build ‘Indian language Alexas’ requires over 1000s of hours of speech data, along with the transcribed text of the same. Dr. Vuppala says,” In our lab, we have been working on speech recognition technology for the last 10 years and have collected 50-60 hours of data. But we now need 1000s of such hours which is very laborious.”

To reach out to the common man, conversational AI in as natural a setting as possible assumes importance. For that, as a cost-effective measure, the project is looking towards crowdsourcing of speech data. Leveraging its physical location of Hyderabad, in the pilot, volunteers are being invited to contribute to Telugu language speech data.

“The idea is to collect around 2000 hours of spoken Telugu over the course of a year. This can be through liaisoning with academic institutions across Andhra Pradesh and Telangana as well as via the existing Telugu Wikipedia community,” says Prakash.

The team is also working with industry partners such as OzoneTel and Pactera Edge and utilizing their network to get access to data. The initial collection of Telugu speech data is expected to lead to the establishment of protocols and systems in place for crowdsourcing of data for all Indian languages – the largest such exercise undertaken in the country.