Agentic Knowledgeable Self-awareness
Shuofei Qiao, Zhisong Qiu, Baochang Ren, Xiaobin Wang, Xiangyuan Ru, Ningyu Zhang, Xiang Chen, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
TL;DR
This paper tackles the brittleness and inefficiency of existing agent planning by introducing agentic knowledgeable self-awareness, a data-centric paradigm that enables LLM-based agents to autonomously regulate knowledge usage according to situational demands. The KnowSelf framework constructs a lightweight knowledge base and a situation-aware data pipeline that marks trajectories with tokens signaling fast thinking, slow thinking, or knowledge-based thinking, and trains agents in two stages (SFT then a policy-aware objective with a DPO/RPO blend). Empirical results on ALFWorld and WebShop show KnowSelf achieves superior planning performance with minimal external knowledge, and scaling analyses reveal strong generalization and a late-layer emergence of self-awareness. The approach reduces knowledge injection costs, improves robustness to distribution shifts, and offers a concrete path toward more autonomous, resource-efficient knowledge-aware agents with practical implications for real-world planning tasks.
Abstract
Large Language Models (LLMs) have achieved considerable performance across various agentic planning tasks. However, traditional agent planning approaches adopt a "flood irrigation" methodology that indiscriminately injects gold trajectories, external feedback, and domain knowledge into agent models. This practice overlooks the fundamental human cognitive principle of situational self-awareness during decision-making-the ability to dynamically assess situational demands and strategically employ resources during decision-making. We propose agentic knowledgeable self-awareness to address this gap, a novel paradigm enabling LLM-based agents to autonomously regulate knowledge utilization. Specifically, we propose KnowSelf, a data-centric approach that applies agents with knowledgeable self-awareness like humans. Concretely, we devise a heuristic situation judgement criterion to mark special tokens on the agent's self-explored trajectories for collecting training data. Through a two-stage training process, the agent model can switch between different situations by generating specific special tokens, achieving optimal planning effects with minimal costs. Our experiments demonstrate that KnowSelf can outperform various strong baselines on different tasks and models with minimal use of external knowledge. Code is available at https://github.com/zjunlp/KnowSelf.
