Table of Contents
Fetching ...

VayuChat: An LLM-Powered Conversational Interface for Air Quality Data Analytics

Vedant Acharya, Abhay Pisharodi, Rishabh Mondal, Mohammad Rafiuddin, Nipun Batra

TL;DR

The paper tackles the challenge of turning dispersed Indian air-quality data into actionable policy insights. It introduces VayuChat, a LLM-powered conversational interface that outputs both executable Python code and interactive visualizations, integrating CPCB measurements, NCAP funding, and population data. Through a Delhi case study, it demonstrates how wind speed inversely correlates with $PM_{2.5}$ and shows end-to-end analytics with code-generation for reproducibility. The work provides public deployment and outlines applications for policymakers, journalists, and educators, with future work on live data streams and richer datasets.

Abstract

Air pollution causes about 1.6 million premature deaths each year in India, yet decision makers struggle to turn dispersed data into decisions. Existing tools require expertise and provide static dashboards, leaving key policy questions unresolved. We present VayuChat, a conversational system that answers natural language questions on air quality, meteorology, and policy programs, and responds with both executable Python code and interactive visualizations. VayuChat integrates data from Central Pollution Control Board (CPCB) monitoring stations, state-level demographics, and National Clean Air Programme (NCAP) funding records into a unified interface powered by large language models. Our live demonstration will show how users can perform complex environmental analytics through simple conversations, making data science accessible to policymakers, researchers, and citizens. The platform is publicly deployed at https://huggingface.co/spaces/SustainabilityLabIITGN/ VayuChat. For further information check out video uploaded on https://www.youtube.com/watch?v=d6rklL05cs4.

VayuChat: An LLM-Powered Conversational Interface for Air Quality Data Analytics

TL;DR

The paper tackles the challenge of turning dispersed Indian air-quality data into actionable policy insights. It introduces VayuChat, a LLM-powered conversational interface that outputs both executable Python code and interactive visualizations, integrating CPCB measurements, NCAP funding, and population data. Through a Delhi case study, it demonstrates how wind speed inversely correlates with and shows end-to-end analytics with code-generation for reproducibility. The work provides public deployment and outlines applications for policymakers, journalists, and educators, with future work on live data streams and richer datasets.

Abstract

Air pollution causes about 1.6 million premature deaths each year in India, yet decision makers struggle to turn dispersed data into decisions. Existing tools require expertise and provide static dashboards, leaving key policy questions unresolved. We present VayuChat, a conversational system that answers natural language questions on air quality, meteorology, and policy programs, and responds with both executable Python code and interactive visualizations. VayuChat integrates data from Central Pollution Control Board (CPCB) monitoring stations, state-level demographics, and National Clean Air Programme (NCAP) funding records into a unified interface powered by large language models. Our live demonstration will show how users can perform complex environmental analytics through simple conversations, making data science accessible to policymakers, researchers, and citizens. The platform is publicly deployed at https://huggingface.co/spaces/SustainabilityLabIITGN/ VayuChat. For further information check out video uploaded on https://www.youtube.com/watch?v=d6rklL05cs4.

Paper Structure

This paper contains 18 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: VayuChat Interface with key features: (1) Select Model - AI model dropdown (gpt-oss-120b), (2) Quick Queries - pre-defined air quality questions (e.g., PM2.5 comparisons, trends), (3) Generated Code - see generated Python code for analysis/visualization, and (4) Custom Queries - natural language input. Example query shows the highest PM$_{2.5}$ level in 2023, with Byrnihat recording 151.51 $\mu$g/m3.
  • Figure 2: Flow diagram of VayuChat workflow: the user submits a query, the selected LLM generates code, which is executed in Python with the relevant dataset, and the output is presented.
  • Figure 3: (Generated by VayuChat) Wind Speed vs PM$_{2.5}$ Concentration - December 2024 Critical Week
  • Figure 4: (Generated by VayuChat) Five-Year December PM$_{2.5}$ and Wind Speed Trends (2019-2024)