Table of Contents
Fetching ...

FISformer: Replacing Self-Attention with a Fuzzy Inference System in Transformer Models for Time Series Forecasting

Bulent Haznedar, Levent Karacan

Abstract

Transformers have achieved remarkable progress in time series forecasting, yet their reliance on deterministic dot-product attention limits their capacity to model uncertainty and nonlinear dependencies across multivariate temporal dimensions. To address this limitation, we propose FISFormer, a Fuzzy Inference System-driven Transformer that replaces conventional attention with a FIS Interaction mechanism. In this framework, each query-key pair undergoes a fuzzy inference process for every feature dimension, where learnable membership functions and rule-based reasoning estimate token-wise relational strengths. These FIS-derived interaction weights capture uncertainty and provide interpretable, continuous mappings between tokens. A softmax operation is applied along the token axis to normalize these weights, which are then combined with the corresponding value features through element-wise multiplication to yield the final context-enhanced token representations. This design fuses the interpretability and uncertainty modeling of fuzzy logic with the representational power of Transformers. Extensive experiments on multiple benchmark datasets demonstrate that FISFormer achieves superior forecasting accuracy, noise robustness, and interpretability compared to state-of-the-art Transformer variants, establishing fuzzy inference as an effective alternative to conventional attention mechanisms.

FISformer: Replacing Self-Attention with a Fuzzy Inference System in Transformer Models for Time Series Forecasting

Abstract

Transformers have achieved remarkable progress in time series forecasting, yet their reliance on deterministic dot-product attention limits their capacity to model uncertainty and nonlinear dependencies across multivariate temporal dimensions. To address this limitation, we propose FISFormer, a Fuzzy Inference System-driven Transformer that replaces conventional attention with a FIS Interaction mechanism. In this framework, each query-key pair undergoes a fuzzy inference process for every feature dimension, where learnable membership functions and rule-based reasoning estimate token-wise relational strengths. These FIS-derived interaction weights capture uncertainty and provide interpretable, continuous mappings between tokens. A softmax operation is applied along the token axis to normalize these weights, which are then combined with the corresponding value features through element-wise multiplication to yield the final context-enhanced token representations. This design fuses the interpretability and uncertainty modeling of fuzzy logic with the representational power of Transformers. Extensive experiments on multiple benchmark datasets demonstrate that FISFormer achieves superior forecasting accuracy, noise robustness, and interpretability compared to state-of-the-art Transformer variants, establishing fuzzy inference as an effective alternative to conventional attention mechanisms.
Paper Structure (16 sections, 13 equations, 5 figures, 9 tables)

This paper contains 16 sections, 13 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Overall architecture of the proposed model, built upon a Transformer encoder. It comprises an embedding layer to convert multivariate input sequences into dense representations, a Transformer block to capture temporal and feature-wise dependencies via the proposed FIS-based Token Interaction module, and a projection module to map encoded features to the prediction output.
  • Figure 2: FIS-based Token Interaction. Query and key tokens are fuzzified through learnable Gaussian membership functions and combined via rule-based fuzzy inference. The resulting fuzzy rules are defuzzified following a first-order Sugeno process to produce the Fuzzy Interaction Map.
  • Figure 3: FIS Interaction Output. The Fuzzy Interaction Map encodes inferred relational strengths across all token dimensions. A softmax operation is applied along the token axis to obtain normalized interaction weights, which are then combined with value representations via element-wise multiplication to produce context-enhanced token embeddings.
  • Figure 4: Visualization of input-predict results on the Traffic, ECL and PEMS dataset.
  • Figure 5: Impact of learning rate on MSE scores for the ETT (Avg), ECL, Traffic, and Weather datasets.