TrajeVAE: Controllable Human Motion Generation with Trajectories

Supplementary Material

Kacper Kania1
Marek Kowalski2
Tomasz TrzciƄski3,4
Warsaw University of Technology1   Microsoft2   Tooploox3   Jagiellonian University4

Reconstructed sequences (Figure 3.)

The first row shows how adding trajectories one by one changes the reconstructed sequence. Since our model predicts the sequence from both the trajectories and the initial pose, even if no trajectories are provided, TrajeVAE still generates plausible motion. We conclude that an initial pose provides a strong prior towards a particular motion. To confirm that TrajeVAE uses both the information about the trajectory and the initial pose, we show in the second row how these motions change when we set trajectories such that $\mathbf{y}_t = \mathbf{y}_1$ for $t \geq 2$. When we provide four trajectories in such a setting, the skeleton freezes for all time steps. Small variations in the pose for four trajectories are caused by the sampling process.

Ground truth sequence

0 trajectories

1 trajectories

2 trajectories

3 trajectories

4 trajectories

Diversity vs. number of input trajectories (Figure 4.)

Each row represents samples from a different set of trajectories. The first row show samples where no trajectories are provided, and the second one - when only a single trajectory is given. We add trajectories progressively in the order: right foot, left foot, right hand and left hand.

Ground truth sequence

10 sampled sequences

Deterministic prediction

Generalization (Figure 5.)

Predicted sequence

Ground truth sequence

Same pose, different trajectories (Appendix)

We additionally show example generated sequences for trajectories when the original trajectory and any other one differs by less then $\epsilon_0 = 0.01$ at the time step $t = 0$. The ground truth sequence with new trajectories is shown to visualize the discrepancy between the sequence and the corresponding new trajectory. The last row shows the original ground truth sequences for these trajectories.

Predicted sequence with new trajectories

Ground truth sequence with new trajectories

The original ground truth sequence for the given trajectories


The creation of plausible and controllable 3D human motion animations is a long-standing problem that requires a manual intervention of skilled artists. Current machine learning approaches can semi-automate the process, however, they are limited in a significant way: they can handle only a single trajectory of the expected motion that precludes fine-grained control over the output. To mitigate that issue, we reformulate the problem of future pose prediction into pose completion in space and time where multiple trajectories are represented as poses with missing joints. We show that such a framework can generalize to other neural networks designed for future pose prediction. Once trained in this framework, a model is capable of predicting sequences from any number of trajectories. We propose a novel transformer-like architecture, TrajeVAE, that builds on this idea and provides a versatile framework for 3D human animation. We demonstrate that TrajeVAE offers better accuracy than the trajectory-based reference approaches and methods that base their predictions on past poses. We also show that it can predict reasonable future poses even if provided only with an initial pose.


    title={TrajeVAE: Controllable Human Motion Generation from Trajectories},
    author={Kania, Kacper and Kowalski, Marek and Trzci{\'n}ski, Tomasz},
    journal={arXiv preprint arXiv:2104.00351},


This work was supported by Microsoft Research through its EMEA PhD Scholarship Programme. We thank Eric Hedlin and members of CVLab at Warsaw University of Technology for insightful comment.