3D human pose estimation is one of the key problems in computer vision that aims to recover the 3D body configuration of the human subjects from single images, depth images or videos in general. It has a wide range of applications in surveillance, animations, human computer interaction, and sports analysis like athletic training.
This project aims to estimate the 3D pose of interacting people from monocular RGB images. Multi-person 3D pose estimation comes with a variety of challenges including body part ambiguities, self-occlusions and severe person-to-person occlusions which make human-to-human interactions more significant for pose prediction. The intuition here is that closely interacting people can carry information about each other’s body pose and pose estimation methods can leverage this mutual information to resolve body part occlusions. Modelling the dependencies between interacting people has not been exploited to the full extend and this is a key element of this project.
Relying on the recent advances in computer vision aided training of athletes, our goal is to estimate the 3D pose of boxers in sparring scenarios to automate their performance analysis. One way to encode the motion dependencies is to exploit temporal information and combine this information using optical flow [1]. Due to the fast moves in boxing sparring, the optical flow should preserve the integrity of the body parts that are moving faster than the rest of the body. Event cameras can be used to estimate the required optical flow reliably and at high frame rates.
The candidate will build on our existing single person pose estimation method [2,3] that relies on deep neural networks and extend it for multi-person scenarios using optical flow to account for the motion dependencies between human subjects.
References:
[1] Pfister et al., "Flowing ConvNets for Human Pose Estimation in Videos" ICCV 2015.
[2] Tekin et al., "Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation" ICCV 2017.
[3] Tekin et al., "Structured Prediction of 3D Human Pose with Deep Neural Networks" BMVC 2016.
Back to the project list.
The candidate should have programming experience, ideally in Python. Previous experience with machine learning and computer vision is a plus.
30% Theory, 30% Implementation, 40% Research and Experiments