The first row shows how adding trajectories one by one changes the reconstructed sequence. Since our model predicts the sequence from both the trajectories and the initial pose, even if no trajectories are provided, TrajeVAE still generates plausible motion. We conclude that an initial pose provides a strong prior towards a particular motion. To confirm that TrajeVAE uses both the information about the trajectory and the initial pose, we show in the second row how these motions change when we set trajectories such that $\mathbf{y}_t = \mathbf{y}_1$ for $t \geq 2$. When we provide four trajectories in such a setting, the skeleton freezes for all time steps. Small variations in the pose for four trajectories are caused by the sampling process.
Ground truth sequence
0 trajectories
1 trajectories
2 trajectories
3 trajectories
4 trajectories
Each row represents samples from a different set of trajectories. The first row show samples where no trajectories are provided, and the second one - when only a single trajectory is given. We add trajectories progressively in the order: right foot, left foot, right hand and left hand.
Ground truth sequence
10 sampled sequences
Deterministic prediction
Predicted sequence
Ground truth sequence
We additionally show example generated sequences for trajectories when the original trajectory and any other one differs by less then $\epsilon_0 = 0.01$ at the time step $t = 0$. The ground truth sequence with new trajectories is shown to visualize the discrepancy between the sequence and the corresponding new trajectory. The last row shows the original ground truth sequences for these trajectories.
Predicted sequence with new trajectories
Ground truth sequence with new trajectories
The original ground truth sequence for the given trajectories
The creation of plausible and controllable 3D human motion animations is a long-standing problem that requires a manual intervention of skilled artists. Current machine learning approaches can semi-automate the process, however, they are limited in a significant way: they can handle only a single trajectory of the expected motion that precludes fine-grained control over the output. To mitigate that issue, we reformulate the problem of future pose prediction into pose completion in space and time where multiple trajectories are represented as poses with missing joints. We show that such a framework can generalize to other neural networks designed for future pose prediction. Once trained in this framework, a model is capable of predicting sequences from any number of trajectories. We propose a novel transformer-like architecture, TrajeVAE, that builds on this idea and provides a versatile framework for 3D human animation. We demonstrate that TrajeVAE offers better accuracy than the trajectory-based reference approaches and methods that base their predictions on past poses. We also show that it can predict reasonable future poses even if provided only with an initial pose.
@article{kania2021trajevae, title={TrajeVAE: Controllable Human Motion Generation from Trajectories}, author={Kania, Kacper and Kowalski, Marek and Trzci{\'n}ski, Tomasz}, journal={arXiv preprint arXiv:2104.00351}, year={2021} }
This work was supported by Microsoft Research through its EMEA PhD Scholarship Programme. We thank Eric Hedlin and members of CVLab at Warsaw University of Technology for insightful comment.