Table of Contents
Fetching ...

Introducing Feature-Based Trajectory Clustering, a clustering algorithm for longitudinal data

Marie-Pierre Sylvestre, Laurence Boulanger

Abstract

We present a new algorithm for clustering longitudinal data. Data of this type can be conceptualized as consisting of individuals and, for each such individual, observations of a time-dependent variable made at various times. Generically, the specific way in which this variable evolves with time is different from one individual to the next. However, there may also be commonalities; specific characteristic features of the time evolution shared by many individuals. The purpose of the method we put forward is to find clusters of individual whose underlying time-dependent variables share such characteristic features. This is done in two steps. The first step identifies each individual to a point in Euclidean space whose coordinates are determined by specific mathematical formulae meant to capture a variety of characteristic features. The second step finds the clusters by applying the Spectral Clustering algorithm to the resulting point cloud.

Introducing Feature-Based Trajectory Clustering, a clustering algorithm for longitudinal data

Abstract

We present a new algorithm for clustering longitudinal data. Data of this type can be conceptualized as consisting of individuals and, for each such individual, observations of a time-dependent variable made at various times. Generically, the specific way in which this variable evolves with time is different from one individual to the next. However, there may also be commonalities; specific characteristic features of the time evolution shared by many individuals. The purpose of the method we put forward is to find clusters of individual whose underlying time-dependent variables share such characteristic features. This is done in two steps. The first step identifies each individual to a point in Euclidean space whose coordinates are determined by specific mathematical formulae meant to capture a variety of characteristic features. The second step finds the clusters by applying the Spectral Clustering algorithm to the resulting point cloud.
Paper Structure (14 sections, 66 equations, 20 figures, 2 tables)

This paper contains 14 sections, 66 equations, 20 figures, 2 tables.

Figures (20)

  • Figure 1: A trajectory of lenght $N=8$ and its underlying function $f(t)$. The observations times are $t_1=1$, $t_2=1.75$, $t_3=2.6$, $t_4=3$, $t_5=3.6$, $t_6=4.2$, $t_7=4.8$, $t_8=5$.
  • Figure 2: In this example, the maximum of the the underlying function (in grey) is 10 so the value of the functional measure is 10. The underlying has been observed five times, resulting in a trajectory (black points) of lenght 5. The maximum value of the trajectory is 9.5 so the value of the trajectory meansure is 9.5.
  • Figure 3: On the left, the area under the curve (in grey) represents the value of the integral $\int_1^5f(t)\,dt$. On the right, this integral is approximated from a trajectory using the trapezoid rule of numerical integration. For any two consecutive points $(t_j,y_j)$, $(t,_{j+1},y_{j+1})$, a rectangle is constructed with base equal to the distance $t_{j+1}-t_j$ between the observation times and heigh equal to the average of the $y$ coordinates, $(y_j+y_{j+1})/2$. The trapezoidal approximation to $\int_1^5f(t)\,dt$ is the sum of the areas of the 7 rectangles constructed this way. In the present case, the true value of the integral is 75.2 and the trapezoidal approximation is 75.1.
  • Figure 4: The image on the left shows, for each point of the trajectory, a small segment of the line whose slope is given by the approximation $D_j$ to the derivative. A good approximation is one for which the red line is almost tangent to the graph. We see that the approximation is lacking especially at the fifth and sixth point, but at least every approximation carries the correct sign. The image on the right shows the graph of the derivative of the underlying function (pink) as well as its approximation $D_j$ at the observation times (red dots). We see that the approximation is farthest from the graph at the fifth and sixth point, in agreement with our previous observation.
  • Figure 5: The image on the left shows the graph of the derivative of the underlying function (pink) as well as its approximation $D_j$ at the observation times (red dots) and, in blue, a small segment of the line whose slope is approximated using $D_j^2$. A good approximation is one for which the blue line is almost tangent to the graph. The approximation is worst than that of $f'(t)$, which is to be expected since the approximation $D^2_j$ is itself built from an approximation. However, except for the first two points the signs are correct. The image on the right shows the graph of the true value of $f"(t)$ (light blue) against the approximations $D_j^2$ (blue dots).
  • ...and 15 more figures