Temporal Clustering of Human Behavior

Decomposing a walking sequence into 4 clusters of movements.

People

Introduction

Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the variability in the temporal scale of human actions, the complexity of representing articulated motion, and the exponential nature of all possible movement combinations.

We pose the problem of learning motion primitives as a temporal clustering one, and derive aligned cluster analysis (ACA)[2] and hierarchical aligned cluster analysis (HACA)[1]. ACA find a partition of a given multi-dimensional time series into m disjoint segments, such that each segment belongs to one of k clusters. ACA combines kernel k-means with the dynamic time alignment kernel to cluster time series. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. ACA is efficiently optimized with a coordinate descent strategy and dynamic programming.

Code

Available at here.

Results

Decomposing a mocap capture sequence of a walking person into 4 clusters of action segments.

This sequence is taken from the CMU Motion Capture Database. The quaternions of 14 joints[3] are computed as the feature for each frame.

Download the [Video 3MB].

Decomposing a mocap capture sequence into 8 clusters of action segments.

You can reproduce the same result using the function demoMocap.m in the code. More results are available here.

Download the [Video 10MB].

Decomposing a video sequence into 8 clusters of action segments.

This sequence is synthesized by concatenating several sequences randomly selected from the Weizmann Action Database. The silhouettes of the person is extracted as the feature. See this [Video 5MB] for the visualization of the feature.

Download the [Video 4MB].

Decomposing a video sequence into 6 clusters of action segments.

This sequence is synthesized by concatenating several sequences randomly selected from the KTH Action Database. The optical flow of the patches around the person is extracted as the feature. See this [Video 4MB] for the visualization of the feature.

Download the [Video 4MB].

Publications

References