Centre to start collecting India AI datasets from January 2025 -

The Indian government is reportedly planning to initiate the collection and aggregation of datasets from ministries, government departments, and both public and private records to develop a comprehensive database containing non-personal information for the India AI Datasets Platform. Once the data is collected, it will be refined to enhance its accessibility, quality, and usability, according to a senior government official who requested anonymity.

The process of collecting and refining this data is expected to take about six months. Following this period, the government will make the compiled database available for training various language models, including large language models (LLMs), small language models, ultra-large language models, and foundational models.

Additionally, the Ministry of Electronics and Information Technology (MeitY) intends to permit domestic startups and companies to utilise the trained data for developing application programming interfaces (APIs). These APIs will interact with the language models built on the database, enabling the provision of specialised services.