Table of Contents
Fetching ...

Deep Learning with Pretrained 'Internal World' Layers: A Gemma 3-Based Modular Architecture for Wildfire Prediction

Ayoub Jadouli, Chaker El Amrani

TL;DR

This work addresses wildfire prediction by reusing the middle transformer layers of a large pretrained model (Gemma-3) as a frozen internal world within a lightweight, tabular-prediction network. By projecting 276 wildfire-related features through a four-branch PMFFNN into Gemma-3's hidden space and freezing the middle layers, the approach achieves high recall with a small trainable parameter footprint. Experiments on a Moroccan dataset show that the Internal World model often yields the best recall and competitive discrimination (AUC), while requiring far fewer trainable parameters than full baselines. The results highlight data-efficient, modular reuse of pretrained transformers as memory-augmented predictors for environmental applications, with practical implications for scalable, interpretable wildfire risk management.

Abstract

Deep learning models, especially large Transformers, carry substantial "memory" in their intermediate layers -- an \emph{internal world} that encodes a wealth of relational and contextual knowledge. This work harnesses that internal world for wildfire occurrence prediction by introducing a modular architecture built upon Gemma 3, a state-of-the-art multimodal model. Rather than relying on Gemma 3's original embedding and positional encoding stacks, we develop a custom feed-forward module that transforms tabular wildfire features into the hidden dimension required by Gemma 3's mid-layer Transformer blocks. We freeze these Gemma 3 sub-layers -- thus preserving their pretrained representation power -- while training only the smaller input and output networks. This approach minimizes the number of trainable parameters and reduces the risk of overfitting on limited wildfire data, yet retains the benefits of Gemma 3's broad knowledge. Evaluations on a Moroccan wildfire dataset demonstrate improved predictive accuracy and robustness compared to standard feed-forward and convolutional baselines. Ablation studies confirm that the frozen Transformer layers consistently contribute to better representations, underscoring the feasibility of reusing large-model mid-layers as a learned internal world. Our findings suggest that strategic modular reuse of pretrained Transformers can enable more data-efficient and interpretable solutions for critical environmental applications such as wildfire risk management.

Deep Learning with Pretrained 'Internal World' Layers: A Gemma 3-Based Modular Architecture for Wildfire Prediction

TL;DR

This work addresses wildfire prediction by reusing the middle transformer layers of a large pretrained model (Gemma-3) as a frozen internal world within a lightweight, tabular-prediction network. By projecting 276 wildfire-related features through a four-branch PMFFNN into Gemma-3's hidden space and freezing the middle layers, the approach achieves high recall with a small trainable parameter footprint. Experiments on a Moroccan dataset show that the Internal World model often yields the best recall and competitive discrimination (AUC), while requiring far fewer trainable parameters than full baselines. The results highlight data-efficient, modular reuse of pretrained transformers as memory-augmented predictors for environmental applications, with practical implications for scalable, interpretable wildfire risk management.

Abstract

Deep learning models, especially large Transformers, carry substantial "memory" in their intermediate layers -- an \emph{internal world} that encodes a wealth of relational and contextual knowledge. This work harnesses that internal world for wildfire occurrence prediction by introducing a modular architecture built upon Gemma 3, a state-of-the-art multimodal model. Rather than relying on Gemma 3's original embedding and positional encoding stacks, we develop a custom feed-forward module that transforms tabular wildfire features into the hidden dimension required by Gemma 3's mid-layer Transformer blocks. We freeze these Gemma 3 sub-layers -- thus preserving their pretrained representation power -- while training only the smaller input and output networks. This approach minimizes the number of trainable parameters and reduces the risk of overfitting on limited wildfire data, yet retains the benefits of Gemma 3's broad knowledge. Evaluations on a Moroccan wildfire dataset demonstrate improved predictive accuracy and robustness compared to standard feed-forward and convolutional baselines. Ablation studies confirm that the frozen Transformer layers consistently contribute to better representations, underscoring the feasibility of reusing large-model mid-layers as a learned internal world. Our findings suggest that strategic modular reuse of pretrained Transformers can enable more data-efficient and interpretable solutions for critical environmental applications such as wildfire risk management.

Paper Structure

This paper contains 49 sections, 1 equation, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Text-based schematic of the Internal-World Model. Brackets mark frozen Gemma layers; solid boxes are trainable.
  • Figure 2: Three-layer Feed-Forward Network (FFN-3L) baseline.
  • Figure 3: One-dimensional convolutional network (CNN-1D) baseline.
  • Figure 4: MLP baseline with learned per-feature embeddings and positional tokens (PE-MLP).
  • Figure 5: Physics-Embedded Entropy hybrid baseline.
  • ...and 10 more figures