The government has launched BharatGen, a pioneering initiative in generative artificial intelligence (GenAI) designed to revolutionise public service delivery and boost citizen engagement through developing a suite of foundational models in language, speech, and computer vision.

This initiative marks the world’s first government-funded multimodal large language model project focused on creating efficient and inclusive AI in Indian languages. It also seeks to create foundational AI models specifically tailored for India, reduce reliance on foreign technologies and strengthen the domestic AI ecosystem for startups, industries, and government agencies.

Spearheaded by IIT Bombay under the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS), of the Department of Science and Technology (DST), the initiative will create GenAI systems that can generate high-quality text and multimodal content in various Indian languages.

BharatGen will deliver GenAI models and their applications as a public good by prioritising India’s socio-cultural and linguistic diversity. It strives to address India’s broader needs such as social equity, cultural preservation, and linguistic diversity, while ensuring that GenAI reaches all segments of society.

The four key distinguishing features of BharatGen are the multilingual and multimodal nature of foundation models; Bhartiya data set based building, and training; open-source platform and development of an ecosystem of GenAI research in the country. The project is expected to be completed in two years along with plans to benefit several government, private, educational, and research institutions.

BharatGen will cater to both text and speech, ensuring coverage across India’s diverse linguistic landscape. By training on multilingual datasets, it will deeply capture the nuances of Indian languages, which are often underrepresented in global AI models. Further, unlike models that rely on global datasets, BharatGen focuses on developing processes for collecting and curating India-centric data, ensuring that the country’s diverse languages, dialects, and cultural contexts are accurately represented. This emphasis on data sovereignty strengthens India’s control over its digital resources and narrative.

A core feature of BharatGen is its focus on data-efficient learning, particularly for Indian languages with limited digital presence. Through fundamental research and collaboration with academic institutions, the initiative will develop models that are effective with minimal data—a critical need for languages underserved by global AI initiatives. BharatGen will also foster a vibrant AI research community through training programs, hackathons, and collaborations with global experts.

Furthermore, BharatGen’s roadmap outlines key milestones up to July 2026. These include extensive AI model development, experimentation, and the establishment of AI benchmarks tailored to India’s needs. BharatGen will also focus on scaling AI adoption across industries and public initiatives.