An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant

Mohit Tomar; Abhisek Tiwari; Tulika Saha; Prince Jha; Sriparna Saha

An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant

Mohit Tomar, Abhisek Tiwari, Tulika Saha, Prince Jha, Sriparna Saha

TL;DR

This work tackles the lack of plant-care dialogue data and multimodal support by introducing the Plantational dataset and the EcoSage dialogue assistant. Plantational comprises approximately one thousand plant-care conversations with accompanying images and annotated intents and dialogue acts, enabling multimodal evaluation. The authors benchmark multiple LLMs and VLMs under zero-shot, few-shot, and fine-tuning settings and propose EcoSage, which uses BLIP-2 visual encoding and LoRA-based adapters within Vicuna for multimodal response generation. Results indicate that incorporating images improves context-specific responses but multimodal alignment remains challenging, underscoring the need for semantic evaluation metrics like BERT-F1; the work lays a foundation for practical, multimodal plant-care assistants.

Abstract

In recent times, there has been an increasing awareness about imminent environmental challenges, resulting in people showing a stronger dedication to taking care of the environment and nurturing green life. The current $19.6 billion indoor gardening industry, reflective of this growing sentiment, not only signifies a monetary value but also speaks of a profound human desire to reconnect with the natural world. However, several recent surveys cast a revealing light on the fate of plants within our care, with more than half succumbing primarily due to the silent menace of improper care. Thus, the need for accessible expertise capable of assisting and guiding individuals through the intricacies of plant care has become paramount more than ever. In this work, we make the very first attempt at building a plant care assistant, which aims to assist people with plant(-ing) concerns through conversations. We propose a plant care conversational dataset named Plantational, which contains around 1K dialogues between users and plant care experts. Our end-to-end proposed approach is two-fold : (i) We first benchmark the dataset with the help of various large language models (LLMs) and visual language model (VLM) by studying the impact of instruction tuning (zero-shot and few-shot prompting) and fine-tuning techniques on this task; (ii) finally, we build EcoSage, a multi-modal plant care assisting dialogue generation framework, incorporating an adapter-based modality infusion using a gated mechanism. We performed an extensive examination (both automated and manual evaluation) of the performance exhibited by various LLMs and VLM in the generation of the domain-specific dialogue responses to underscore the respective strengths and weaknesses of these diverse models.

An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant

TL;DR

Abstract

Paper Structure (20 sections, 1 equation, 6 figures, 3 tables)

This paper contains 20 sections, 1 equation, 6 figures, 3 tables.

Introduction
Related Works
Dataset
Data Collection
Data Creation and Annotation
Plantational Dataset
Methodology
Benchmark Setup
Proposed Model
Textual and Visual Encoding.
Parameter Efficient Fine-tuning.
Response Generation.
Implementation Details
Results and Discussion
Experimental Results
...and 5 more sections

Figures (6)

Figure 1: A conversational illustration between a user and the agent
Figure 2: (a) Indicates original Reddit post; (b) Converted Reddit post into a conversation between a User and the Agent
Figure 3: Class distribution in terms of % representation (a) for intent categories, (b) for DA categories in the Plantational dataset
Figure 4: Proposed model. $A$ and $B$ represent the LoRA modules. Frozen weight represents the frozen weights of Multi-head attention; $x$ and $h$ are hidden representations before and after applying the LoRA module. In our model, LoRA and Linear Projection Layer are trainable while the rest is frozen.
Figure 5: Human evaluation scores of different models based on diverse metrics
...and 1 more figures

An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant

TL;DR

Abstract

An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant

Authors

TL;DR

Abstract

Table of Contents

Figures (6)