Table of Contents
Fetching ...

CubeletWorld: A New Abstraction for Scalable 3D Modeling

Azlaan Mustafa Samad, Hoang H. Nguyen, Lukas Berg, Henrik Müller, Yuan Xue, Daniel Kudenko, Zahra Ahmadi

TL;DR

CubeletWorld presents a discretized 3D grid framework to unify heterogeneous urban data into cubelets, enabling privacy-preserving, scalable reasoning across multiple granularities. It formalizes a state-prediction task on this grid and introduces the CubeletBoids dataset to benchmark occupancy forecasting in a controlled 3D environment. The authors propose two deep architectures, a CNN-LSTM and an A3T-GCN with subgraph decomposition, to address spatiotemporal cubelet forecasting and scalability challenges. Experiments reveal that high-resolution cubelets magnify sparsity and computational demands, with subgraph-based GNNs offering a practical path to scalable, accurate predictions. The work highlights privacy advantages, discusses limitations, and provides clear directions toward region-specific, hierarchical, and multi-modal cubelet-based modeling for urban analytics and emergency planning.

Abstract

Modern cities produce vast streams of heterogeneous data, from infrastructure maps to mobility logs and satellite imagery. However, integrating these sources into coherent spatial models for planning and prediction remains a major challenge. Existing agent-centric methods often rely on direct environmental sensing, limiting scalability and raising privacy concerns. This paper introduces CubeletWorld, a novel framework for representing and analyzing urban environments through a discretized 3D grid of spatial units called cubelets. This abstraction enables privacy-preserving modeling by embedding diverse data signals, such as infrastructure, movement, or environmental indicators, into localized cubelet states. CubeletWorld supports downstream tasks such as planning, navigation, and occupancy prediction without requiring agent-driven sensing. To evaluate this paradigm, we propose the CubeletWorld State Prediction task, which involves predicting the cubelet state using a realistic dataset containing various urban elements like streets and buildings through this discretized representation. We explore a range of modified core models suitable for our setting and analyze challenges posed by increasing spatial granularity, specifically the issue of sparsity in representation and scalability of baselines. In contrast to existing 3D occupancy prediction models, our cubelet-centric approach focuses on inferring state at the spatial unit level, enabling greater generalizability across regions and improved privacy compliance. Our results demonstrate that CubeletWorld offers a flexible and extensible framework for learning from complex urban data, and it opens up new possibilities for scalable simulation and decision support in domains such as socio-demographic modeling, environmental monitoring, and emergency response. The code and datasets can be downloaded from here.

CubeletWorld: A New Abstraction for Scalable 3D Modeling

TL;DR

CubeletWorld presents a discretized 3D grid framework to unify heterogeneous urban data into cubelets, enabling privacy-preserving, scalable reasoning across multiple granularities. It formalizes a state-prediction task on this grid and introduces the CubeletBoids dataset to benchmark occupancy forecasting in a controlled 3D environment. The authors propose two deep architectures, a CNN-LSTM and an A3T-GCN with subgraph decomposition, to address spatiotemporal cubelet forecasting and scalability challenges. Experiments reveal that high-resolution cubelets magnify sparsity and computational demands, with subgraph-based GNNs offering a practical path to scalable, accurate predictions. The work highlights privacy advantages, discusses limitations, and provides clear directions toward region-specific, hierarchical, and multi-modal cubelet-based modeling for urban analytics and emergency planning.

Abstract

Modern cities produce vast streams of heterogeneous data, from infrastructure maps to mobility logs and satellite imagery. However, integrating these sources into coherent spatial models for planning and prediction remains a major challenge. Existing agent-centric methods often rely on direct environmental sensing, limiting scalability and raising privacy concerns. This paper introduces CubeletWorld, a novel framework for representing and analyzing urban environments through a discretized 3D grid of spatial units called cubelets. This abstraction enables privacy-preserving modeling by embedding diverse data signals, such as infrastructure, movement, or environmental indicators, into localized cubelet states. CubeletWorld supports downstream tasks such as planning, navigation, and occupancy prediction without requiring agent-driven sensing. To evaluate this paradigm, we propose the CubeletWorld State Prediction task, which involves predicting the cubelet state using a realistic dataset containing various urban elements like streets and buildings through this discretized representation. We explore a range of modified core models suitable for our setting and analyze challenges posed by increasing spatial granularity, specifically the issue of sparsity in representation and scalability of baselines. In contrast to existing 3D occupancy prediction models, our cubelet-centric approach focuses on inferring state at the spatial unit level, enabling greater generalizability across regions and improved privacy compliance. Our results demonstrate that CubeletWorld offers a flexible and extensible framework for learning from complex urban data, and it opens up new possibilities for scalable simulation and decision support in domains such as socio-demographic modeling, environmental monitoring, and emergency response. The code and datasets can be downloaded from here.

Paper Structure

This paper contains 21 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: An example of CubeletWorld with cubelets spanning across different entities across the city. Different cubelets contain different types of information about the volumetric space, such as entity type, occupancy, temperature, and air quality index. Different cubelets can be of different dimensions.
  • Figure 2: CubeletWorld Dataset Preprocessing: (a) The 3D CubeletWorld $\mathcal{E}$ is discretized into total $n$ cubelets each of unit length, this is a matrix $\mathcal{M}_t \in R^{n_1\times n_2\times n_3}$. In this example, we aggregate eight unit cubelets into a bigger single cubelet. The aggregation of cubelets depends on the choice of resolution required for the use case. In this case, several unit cubelets are aggregated into a single bigger cubelet. The occupancy of the aggregated cubelet depends on the occupancy of the unit cubelet; that is, if any of the unit cubelets is occupied, then the aggregated cubelet is labeled as occupied. (b) The time-series Cubelet Boids dataset is converted to training samples by considering 10 time steps in history and combining them into a single training sample to be used as input to the model.
  • Figure 3: A realistic 3D CubeletWorld featuring various terrain entities such as tall buildings (in gray in the background), streets running vertically with trees alongside, and various static entities. Clusters of yellow coloured boids (in circle) are dispersed throughout different regions of the CubeletWorld.
  • Figure 4: Architecture of 3DCNN-LSTM model: The 3DCNN-LSTM model has two layers: 3D Convolution and LSTM layer. (a) A 3D convolution kernel convolves over the CubeletWorld sample, producing an output that is then flattened. (b) The flattened output then passes through the LSTM layer. The LSTM layer outputs the hidden state, which is then flattened and passed through a fully connected layer followed by a sigmoid function, which predicts the occupancy of each cubelet.
  • Figure 5: Architecture of A3T-GCN model: (a) The discretized CubeletWorld is first converted to a graph. (b) The generated graph is used as input to the A3T-GCN model. (c) The A3T-GCN model combines Temporal Graph Convolutional Network (T-GCN) zhao2019t with an attention mechanism. Here, $h_t$ and $c_t$ denote hidden and cell states, while $a_t$ (and $a_{t-n}$) are attention scores over past hidden states, forming a context vector $C_t$ for predicting future cubelet occupancy.